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Abstract 

This  thesis  proposes  a  unified  approach  for  controlling  a  group  of  robots  to  reach 
a  goal  configuration  in  a  decentralized  fashion.  As  a  motivating  example,  robots 
are  controlled  to  spread  out  over  an  environment  to  provide  sensor  coverage.  This 
example  gives  rise  to  a  cost  function  that  is  shown  to  be  of  a  surprisingly  general 
nature.  By  changing  a  single  free  parameter,  the  cost  function  captures  a  variety 
of  different  multi-robot  objectives  which  were  previously  seen  as  unrelated.  Stable, 
distributed  controllers  are  generated  by  taking  the  gradient  of  this  cost  function.  Two 
fundamental  classes  of  multi-robot  behaviors  are  delineated  based  on  the  convexity  of 
the  underlying  cost  function.  Convex  cost  functions  lead  to  consensus  (all  robots  move 
to  the  same  position),  while  any  other  behavior  requires  a  nonconvex  cost  function. 

The  multi-robot  controllers  are  then  augmented  with  a  stable  on-line  learning 
mechanism  to  adapt  to  unknown  features  in  the  environment.  In  a  sensor  cover¬ 
age  application,  this  allows  robots  to  learn  where  in  the  environment  they  are  most 
needed,  and  to  aggregate  in  those  areas.  The  learning  mechanism  uses  communica¬ 
tion  between  neighboring  robots  to  enable  distributed  learning  over  the  multi-robot 
system  in  a  provably  convergent  way. 

Three  multi-robot  controllers  are  then  implemented  on  three  different  robot  plat¬ 
forms.  Firstly,  a  controller  for  deploying  robots  in  an  environment  to  provide  sensor 
coverage  is  implemented  on  a  group  of  16  mobile  robots.  They  learn  to  aggregate 
around  a  light  source  while  covering  the  environment.  Secondly,  a  controller  is  imple¬ 
mented  for  deploying  a  group  of  three  flying  robots  with  downward  facing  cameras 
to  monitor  an  environment  on  the  ground.  Thirdly,  the  multi-robot,  model  is  used 
as  a  basis  for  modeling  the  behavior  of  a  herd  of  cows  using  a  system  identification 
approach.  The  controllers  in  this  thesis  arc  distributed,  theoretically  proven,  and 
implemented  on  multi-robot  platforms. 

Thesis  Supervisor:  Daniela  Rus 

Title:  Professor  of  Electrical  Engineering  and  Computer  Science 
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Chapter  1 


Introduction 


The  robots  of  the  future  will  be  numerous  and  talkative.  This  is  an  inevitable  result  of 
the  decreasing  cost  of  electronics,  combined  with  the  increasing  ubiquity  of  networking 
technology.  Large  networks  of  robots  have  far  reaching  technological  potential.  They 
will  change  the  way  we  grow  food,  manufacture  products,  and  wage  wars;  they  will 
help  us  to  look  after  the  environment,  to  collect  scientific  data,  and  will  allow'  us 
to  explore  unfamiliar  places.  More  generally,  multi-robot  systems  will  enable  direct 
human  influence  over  large  scale  natural  phenomena.  However,  before  we  can  see  the 
benefits  of  multi-robot  technology,  we  must  first  answer  a  basic  question:  how  do  we 
control  all  of  these  gregarious  robots?  What  should  we  make  them  say  to  one  another, 
and  how  should  they  act  upon  the  information  they  get  from  their  neighbors? 

In  this  thesis  we  define,  analyze,  and  implement  a  multi-robot  control  approach 
based  on  distributed  optimization  of  a  cost  function  to  stably  and  adaptively  control 
the  movement  of  a  group  of  robots  towards  a  goal.  We  define  a  cost  function  that  can 
lead  to  various  different  multi-robot  behaviors  by  changing  a  single  free  parameter. 
Using  the  cost  function,  we  design  gradient  descent  controllers  so  that  each  robot 
moves  to  decrease  the  value  of  the  cost  function.  Our  controllers  allow  for  simple, 
general  stability  and  convergence  results,  and  lead  to  robust,  practical  control  strate¬ 
gies  that  can  be  implemented  on  robots  with  limited  computational  resources.  Our 
controllers  are  adaptive  to  failures.  If  one  robot  fails,  the  others  will  automatically 
reconfigure  to  compensate  for  it.  They  also  are  adaptive  to  slowly  changing  environ- 
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ments.  The  robots  will  move  to  try  to  maintain  an  optimal  configuration  in  response 
to  changing  environmental  conditions.  The  control  approach  in  this  thesis  can  be 
used,  for  example,  to  deploy  a  group  of  hovering  robots  (i.e.  autonomous  helicopters) 
with  downward  facing  cameras  to  collectively  monitor  an  area  on  the  ground.  With 
knowledge  of  its  own  position  and  the  positions  of  its  neighbors,  each  robot  moves  to 
maximize  its  own  field  of  view,  while  not  overlapping  its  field  of  view  too  much  with 
neighboring  robots.  The  robots  deploy  themselves  over  the  area,  spreading  outward 
and  upward  until  the  whole  area  is  covered  by  the  group.  In  many  applications,  as  in 
this  example,  the  use  of  a  multi-robot  system  offers  superior  robustness,  speed,  and 
sensor  resolution  over  a  single  robot  solution. 


The  design,  analysis,  and  implementation  of  multi-robot  controllers  are  accom¬ 
panied  by  difficulties  peculiar  to  the  multi-robot  setting.  Firstly,  in  designing  multi¬ 
robot  controllers,  one  must  always  consider  the  constraint  that  each  robot  has  only 
partial  information,  yet  the  whole  group  must  converge  to  a  desired  configuration. 
This  demands  a  careful  composition  of  communication,  control,  and  sensing.  Sec¬ 
ondly,  proving  analytical  properties  of  multi-robot  systems  is  difficult  because  their 
dynamics  are  often  nonlinear,  and  they  are  coupled  through  a  network  which  changes 
over  time.  Thirdly,  implementing  multi-robot  controllers  requires  maintaining  mul¬ 
tiple  robot  platforms  simultaneously,  and  implementing  algorithms  over  real  ad  hoc 
wireless  networks.  We  overcome  each  of  these  difficulties  in  this  thesis.  We  design 
controllers  by  focusing  our  attention  to  multi-robot  tasks  that  can  be  quantified  by  a 
cost  function,  and  by  using  gradient  descent  controllers  to  minimize  the  cost  function. 
We  analyze  the  performance  of  our  controllers  using  a  combination  of  analysis  tools 
from  optimization,  Lyapunov  stability  theory,  and  graph  theory  to  prove  theorems 
concerning  the  asymptotic  convergence  of  the  robots  to  their  final  configurations,  the 
rate  of  their  convergence,  and  the  optimality  of  their  final  configurations.  Finally, 
we  implement  multi-robot  algorithms  on  three  different  hardware  platforms  (shown 
in  Figure  1-1),  a  group  of  ground  robots,  a  group  of  flying  quad-rotor  robots,  and 
sensor /actuator  boxes  mounted  to  the  heads  of  cows  in  a  herd. 
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(a)  Ground  Robots  (b)  Quad-Rotors  (c)  Cows 

Figure  1-1:  The  robot  platforms  used  in  the  three  case  studies  are  shown. 


1.1  Approach 


As  an  archetypal  problem,  we  consider  the  deployment  of  multiple  robots  over  an 
environment  for  distributed  surveillance  and  sensing,  a  task  we  call  coverage.  The 
example  described  above  with  hovering  robots  is  a  typical  instance  of  coverage.  We 
show  that  coverage  is  closely  related  to  a  number  of  other  multi-robot  tasks,  includ¬ 
ing  consensus  (all  the  robots  move  to  the  same  position),  and  herding  (the  robots 
aggregate  in  group  without  collisions).  The  link  between  these  tasks  is  elucidated  by 
showing  that  they  all  result  from  variations  on  the  same  optimization  problem,  and 
controllers  for  each  of  these  tasks  are  shown  to  be  obtained  by  taking  the  negative 
gradient  of  the  cost  function.  We  then  consider  the  situation  in  which  robots  must 
learn  a  function  in  the  environment  to  carry  out  their  control  task.  For  example,  in 
a  sensor  coverage  task,  the  robots  learn  the  areas  that  require  the  most  dense  sensor 
coverage  and  move  to  aggregate  in  those  areas.  Learning  is  incorporated  in  a  dis¬ 
tributed  way  with  provable  stability  and  performance  guarantees.  Building  upon  this 
foundation,  we  pursue  three  detailed  multi-robot  case  studies.  The  controllers  for 
these  case  studies  are  implemented  on  three  different  kinds  of  multi-robot  platforms: 
a  swarm  of  ground  robots  (Figure  1-1  (a)),  a  group  of  flying  quad-rotor  robots  (Figure 
l-l(b)),  and  a  herd  of  cows  outfitted  with  sensing  and  control  boxes  (Figure  l-l(c)). 
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Scope  and  Limitations 


In  general,  the  control  of  multi-robot  systems  is  an  intractably  large  problem  space. 
One  must  specify  the  kind  of  multi-robot  systems  and  the  class  of  multi-robot  tasks 
to  make  meaningful  headway.  This  thesis  focuses  on  multi-robot  systems  composed 
of  identical  robots  with  simple  dynamics.  Furthermore  the  tasks  we  consider  are 
those  that  can  be  formulated  as  the  optimization  of  a  cost  function  that  depends 
upon  the  positions  of  the  robots.  In  other  words,  our  problem  space  is  to  drive  the 
robots  to  a  final  goal  configuration  (or  to  a  set  of  possible  goal  configurations)  and 
remain  fixed  unless  a  robot  fails  or  the  environment  changes,  in  which  case  the  robots 
adjust  to  find  a  new  goal  configuration.  This  precludes,  for  example,  algorithmic 
tasks  involving  temporal  logic  specifications,  such  as,  “move  in  formation  to  area  A, 
explore  area  A  for  T  hours,  then  return  in  formation,  while  avoiding  areas  B  and 
C.”  It  also  precludes  tasks  in  which  not  only  the  final  positions  of  the  robots  are 
important,  but  also  their  trajectories,  for  example,  moving  the  robots  to  cover  an 
environment  using  the  shortest  possible  paths.  These  kinds  of  tasks  might  also  be 
phrased  as  optimizations,  though  not  with  cost  functions  that  only  depend  on  the 
positions  of  the  robots. 

Notice  that  our  problem  space  naturally  divides  into  regimes  based  upon  aggre¬ 
gation.  The  goal  configuration  of  the  robots  can  either  be  spread  apart  over  the  envi¬ 
ronment,  in  which  case  we  say  they  are  doing  coverage.  Otherwise  they  are  grouped 
together,  in  which  case  we  say  they  are  doing  consensus  (if  they  are  all  occupying  the 
same  position),  or  herding  (if  they  are  not  all  occupying  the  same  position).  This 
exhausts  the  possible  species  of  goal  configurations  in  our  problem  space.  Despite 
this  simple  characterization,  we  will  see  that  rather  complex  controllers  and  tasks  are 
possible  in  this  problem  space. 

Gradient  Based  Control 

The  main  feature  that  defines  our  class  of  multi-robot  problems  is  that  they  can  be 
cast  in  terms  of  the  optimization  of  a  cost  function  that  depends  on  the  positions 
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of  the  robots.  We  derive  distributed  controllers  by  taking  the  negative  gradient  of 
the  cost  function  so  that  the  robots  are  always  moving  to  decrease  the  cost  of  their 
positions.  Thus  the  closed-loop  system  is  a  gradient  system,  which  is  a  dynamical 
system  whose  dynamics  are  given  by  the  negative  gradient  of  a  cost  function. 

Restricting  our  scope  to  gradient  systems  still  allows  for  considerable  complexity  in 
behavior,  but  has  two  pronounced  benefits.  Firstly,  it  ties  the  closed  loop-system  to  an 
optimization  problem.  Optimization  problems  have  a  powerful  set  of  mathematical 
tools  that  can  be  used  to  prove  fundamental  properties  of  the  closed-loop  system. 
Secondly,  gradient  systems  carry  with  them  particularly  strong  and  simple  stability 
guarantees  which  can  be  deduced  from  the  properties  of  the  underlying  cost  function 
alone.  This  leads  to  goal  directed  behavior,  because  gradient  controllers  move  the 
robots  to  local  minima  of  the  cost  function  which  represent  goal  states. 

Incorporating  Learning 

This  thesis  puts  a  heavy  emphasis  on  the  role  of  learning  and  adaptation  on  multi¬ 
robot  control,  which  distinguishes  it  from  other  works  in  this  area.  In  any  group  of 
biological  agents,  learning  and  adaptation  plays  a  key  role  in  the  group’s  behavior. 
Agents  learn  to  specialize  their  roles  within  the  group,  and  they  adapt  their  behav¬ 
ior  using  knowledge  they  acquire  about  the  environment.  Taking  inspiration  from 
biological  agents,  we  integrate  learning  within  multi-robot  controllers  with  rigorous 
stability  guarantees,  so  as  to  enable  more  complex  behaviors.  Control  and  learning 
are  both  represented  under  the  umbrella  of  optimization  and  are  analyzed  with  the 
tools  of  Lyapunov  stability  theory.  We  use  the  term  learning  to  specifically  mean 
tuning  the  parameters  of  a  parametric  model,  as  in  system  identification  or  adaptive 
control.  This  is  a  more  narrow  notion  of  learning  than  that  used  in  the  statistical 
learning  and  machine  learning  communities. 

In  our  applications  learning  is  important  for  the  robots  to  integrate  sensor  infor¬ 
mation  into  their  behavior  and  thereby  accommodate  uncertainty  about  the  desired 
task.  For  example,  in  a  coverage  control  application  in  which  robots  spread  out  over 
an  environment  to  do  sensing,  it  is  useful  for  the  robots  to  concentrate  in  areas  where 
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their  sensors  axe  most  needed.  Our  controllers  allow  the  robots  to  learn  on-line  where 
they  are  most  needed  and  to  automatically  concentrate  in  those  areas.  In  this  thesis 
we  also  use  learning  in  the  form  of  parameter  tuning  to  build  models  of  cow  herding. 
GPS  data  from  cows  are  used  to  tune  the  parameters  of  a  model  to  fit  the  behavior 
of  the  cows,  including  their  affinity  for  one  another  and  their  preferred  paths  over  the 
environment. 


Basic  Assumptions 

In  this  thesis,  the  robots  are  assumed  to  have  simple  integrator  dynamics,  unless 
otherwise  explicitly  stated,  so  that  the  control  input  is  the  robot’s  velocity.  This  kind 
of  kinematic  model  is  common  in  the  multi-robot  control  literature,  and  we  have  found 
in  repeated  experimental  implementations  that  it  is  a  good  approximation  provided 
that  the  robots  have  a  fast  inner  control  loop  to  regulate  the  velocity.  Indeed,  from  a 
design  point  of  view  it  is  desirable  to  separate  the  control  of  the  high-level  multi-robot 
behavior  from  the  low-level  dynamical  behavior  of  the  robots.  Commercially  available 
robot  platforms  typically  have  fast  low-level  control  loops  to  track  a  commanded 
velocity  or  to  drive  to  given  way  points. 

Also,  in  this  thesis  the  robots  interact  with  one  another  over  a  network  and  are 
assumed  to  be  able  to  exchange  state  information,  such  as  position,  only  with  their 
immediate  neighbors  in  the  network.  We  do  not  assume  the  presence  of  a  point-to- 
point  routing  protocol  to  move  messages  between  two  specific  robots  in  the  network. 
In  an  implementation,  this  means  that  robots  simply  broadcast  their  states  at  each 
time  step,  and  any  other  robot  within  communication  range  uses  the  broadcasted 
state  as  required  by  its  controller.  Also,  we  do  not  explicitly  consider  network  delays, 
bandwidth  constraints,  packet  losses,  and  other  real  but  difficult-to-model  limita¬ 
tions  on  network  performance.  However,  in  multiple  hardware  implementations  our 
controllers  are  shown  to  perform  well  despite  the  limitations  of  real  networks. 
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1.2  Applications 


The  multi-robot  control  strategies  in  this  thesis  have  two  key  qualities  that  make  them 
superior  to  single  robots  systems  for  many  applications.  Firstly,  they  are  inherently 
tolerant  to  the  failure  of  any  individual  in  the  group,  whereas  in  a  single  robot  system, 
if  the  robot  fails  the  mission  fails.  Secondly,  multi-robot  systems  can  carry  out  tasks 
over  large  geographical  areas  more  quickly  and  with  greater  resolution  than  single 
robots.  The  robots  can  move  in  parallel  to  cover  an  environment,  while  a  single 
robot  must  traverse  the  environment  in  a  serial  fashion.  These  qualities  make  our 
multi-robot  controllers  useful  in  a  broad  range  of  applications,  as  described  below. 

Large  Scale  Scientific  Data  Collection 

Scientists  who  study  large  scale  environmental  systems  such  as  ecologists,  geologists, 
archaeologists,  oceanographers,  and  meteorologists  spend  an  inordinate  amount  of 
time  and  effort  to  collect  data  over  large  geographical  areas.  This  laborious  activity 
could  be  automated  by  the  use  of  a  multi-robot  system.  To  that  end,  the  controllers 
considered  in  this  thesis  can  be  used  to  deploy  underwater  robots  over  a  coral  reef  to 
monitor  coral  health,  or  to  deploy  flying  robots  to  take  measurements  over  geological 
formations,  forests,  or  archaeological  sites.  Such  controllers  could  also  be  used  by 
collaborating  rovers  to  explore  the  surface  of  other  planets  or  moons.  The  ability  to 
collect  scientific  data  over  large  geographical  areas  will  accelerate  scientific  progress 
and  facilitate  the  study  of  physical  and  biological  processes  that  take  place  beyond 
the  scale  of  current  measurement  apparatus. 

Distributed  Surveillance  and  Servicing 

The  control  strategies  in  this  thesis  are  also  useful  for  distributed  surveillance  or  ser¬ 
vicing.  In  many  scenarios  we  want  to  automatically  monitor  an  area  for  security, 
to  identify  irregular  activity,  or  to  provide  some  service  to  users  in  the  area.  For 
example,  in  a  military  context  a  group  of  robots  (flying  robots,  ground  robots,  or  a 
mixed  group)  could  use  our  controllers  to  position  themselves  over  a  battle  field.  They 
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could  be  used  to  monitor  the  movement  and  concentration  of  enemy  combatants,  or  to 
provide  mapping  information  and  navigation  information  about  the  battlefield.  The 
robots  could  also  be  used  to  provide  network  support  to  troops  and  vehicles  on  the 
battlefield,  acting  as  self-positioning  network  routers.  This  is  useful  in  civilian  appli¬ 
cations  as  well.  For  example,  they  could  be  used  in  disaster  relief  to  provide  mapping 
information  or  to  act  as  an  ad  hoc  communication  network  for  rescue  workers.  Robots 
can  also  be  deployed  using  our  controllers  over  an  urban  area  to  monitor  human  ac¬ 
tivity,  or  to  provide  wireless  connectivity  for  computers  and  mobile  phones.  They 
could  also  be  used  to  position  flying  robots  with  cameras  over,  for  example,  sporting 
events  or  parades  to  provide  media  coverage.  The  same  controllers  can  be  used  also 
for  multi-robot  servicing  tasks,  for  example,  to  position  oil  clean-up  robots  over  an 
oil  spill  so  that  they  clean  up  the  spill  in  minimum  time,  or  to  position  de-mining 
robots  to  service  a  mine  field  in  minimum  time. 

Formations  for  Traveling  in  Groups 

The  controllers  in  this  thesis  are  also  relevant  to  formation  flying  applications.  Main¬ 
taining  a  formation  is  useful  for  groups  of  aerial  and  ground  robots  as  a  means  of 
traveling  in  a  team.  This  is  particularly  useful  in  a  semi-automated  scenario  in  which 
a  human  operator  controls  a  group  of  vehicles.  Directly  controlling  each  vehicle  in  the 
group  is  too  complicated  a  task  for  a  single  operator,  but  if  each  vehicle  is  equipped 
with  a  formation  controller,  the  pilot  can  control  the  group  abstracted  as  a  single 
entity. 

Modeling  Biological  Group  Behavior 

The  models  and  tools  described  in  this  work  can  be  seen  through  engineer’s  eyes  as 
controllers  for  groups  of  robots,  but  they  can  also  be  seen  through  scientist’s  eyes  as 
mathematical  models  of  the  dynamics  of  groups  of  biological  agents.  They  can  be 
used  to  predict  and  describe  foraging,  predation,  herding,  and  other  group  behaviors 
in  nature.  For  example,  one  of  the  detailed  case  studies  in  this  thesis  deals  with 
modeling  the  herding  behavior  of  cows.  The  modeling  technique  can  also  be  used 
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to  generate  dynamical  models  of  groups  of  other  animals,  crowds  of  people,  or  even 
traffic  flow.  Dynamical  models  of  biological  group  behavior  can  also  be  used  to  drive 
groups  of  robots  to  mimic  the  behavior  of  natural  groups.  This  may  be  useful  in 
reproducing  collaborative  behaviors  exhibited  in  natural  systems,  or  in  producing 
decoy  robots  to  participate  with  natural  or  engineered  groups,  and  even  to  influence 
the  behavior  of  natural  groups  [39].  Therefore,  in  this  work  the  word  robot  should  be 
seen  as  including  natural  autonomous  agents  such  as  people,  cells,  or  cows. 


1.3  State  of  the  Art 

The  study  of  controlling  multiple  robots  with  the  ability  to  communicate  over  a 
network  has  become  a  particularly  important  part  of  the  controls  and  robotics  research 
communities.  Most  research  has  focused  on  prototypical  tasks.  These  prototypical 
tasks  can  be  stated  as  follows.1 

1.  Coverage-  the  deployment  of  a  group  of  agents  over  an  environment,  or,  more 
generally,  the  dispersion  of  agents’  states  over  a  state  space. 

2.  Consensus -  the  convergence  of  a  group  of  agents  to  a  common  point,  or,  more 
generally,  convergence  of  the  states  of  a  group  of  agents  to  a  common  final  vec¬ 
tor  or  manifold.  Consensus  can  be  seen  as  in  opposition  to  coverage,  since  in 
the  former  agents  come  together  and  in  the  later  they  spread  apart.  This  phe¬ 
nomenon  is  often  called  by  other  names  including  rendezvous,  agreement,  and 
flocking,  and  is  closely  related  to  gossip  algorithms  in  distributed  computation, 
and  oscillator  synchronization. 

3.  Herding-  the  aggregation  of  a  group  of  agents  in  such  a  way  that  they  do  not  get 
too  far  from,  nor  too  close  to,  one  another.  This  can  be  seen  as  a  composition 
of  coverage  and  consensus. 

1  These  categories,  including  their  names,  are  my  convention  and  are  necessarily  somewhat  arbi¬ 
trary. 
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In  the  literature,  these  tasks  have  emerged  separately  from  one  another  and  are 
usually  treated  as  entirely  different  problems.  This  thesis  shows  that  they  all  derive 
from  the  same  cost  function  and  are  in  fact  all  different  regimes  in  a  continuum  of 
behaviors.  The  main  difference  between  them  is  in  the  convexity  of  the  underlying 
cost  function.  We  discuss  the  relevant  previous  work  from  these  three  areas  in  detail 
in  Chapter  2,  Section  2.2. 

The  state  of  the  art  currently  is  to  analyze  a  variation  on  one  of  these  three  tasks, 
or  to  implement  an  instantiation  of  one  of  them  on  a  multi-robot  platform.  There 
are  few  works  which  combine  these  tasks  to  design  or  implement  more  complex  robot 
behaviors  in  a  multi-robot  setting,  as  is  done  in  this  thesis.  We  combine  coverage 
with  learning,  which  requires  consensus,  thereby  composing  two  of  these  behaviors  in 
a  provably  stable  way  to  create  a  more  complex  behavior.  We  also  combine  herding 
with  learning  to  produce  models  of  complex  cow  herd  motion. 

1.4  Contributions 

The  contributions  of  this  thesis  are  as  follows. 

1.  Unified  Optimization  Formulation -  An  optimization  problem  is  presented  which 
is  general  enough  to  represent  all  of  the  three  categories  of  multi-robot  problems 
described  above.  This  illuminates  the  common  nature  behind  these  multi-robot 
phenomena  and  allows  for  their  treatment  under  a  unified  gradient  optimization 
setting. 

2.  Convexity  and  Consensus -  We  prove  that  if  the  cost  function  representing  a 
multi-robot  problem  is  convex,  then  one  of  its  global  optima  is  consensus  (i.e. 
all  robots  occupying  the  same  state).  Conversely,  if  we  wish  to  solve  a  problem 
for  which  consensus  is  not  optimal,  for  example  coverage  or  herding,  we  know 
that  the  underlying  cost  function  must  be  nonconvex.  This  has  important 
ramifications  for  reaching  global  optima  using  gradient  based  controllers. 

3.  On-Line  Learning -  Learning  in  the  form  of  on-line  parameter  adaptation  is 


24 


incorporated  into  our  multi-robot  controller.  We  use  consensus  in  a  novel  way 
to  enable  learning  in  the  distributed  multi-robot  setting,  so  that  each  robot  in 
the  group  learns  asymptotically  as  well  as  if  it  had  global  information.  Stability 
and  convergence  properties  are  proved  using  a  Lyapunov  approach.  Learning 
allows  for  provably  stable  multi-robot  behavior  that  can  adapt  to  uncertain  and 
slowly  changing  environments. 

4.  Implementation  of  Learning  Coverage  Controller  on  Mobile  Robots  A  coverage 
controller  with  on-line  learning  is  implemented  on  a  group  of  SwarmBots  (Figure 
1-1  (a)).  A  group  of  16  SwarmBots  learns  the  distribution  of  sensor  information 
in  the  environment  while  spreading  out  to  cover  the  environment. 

5.  Implementation  of  Camera  Coverage  Controller  on  Flying  Quad-Rotor  Robots 
Using  the  unified  optimization  formulation,  a  coverage  controller  is  designed 
for  controlling  a  group  of  flying  robots  with  downward  facing  cameras.  The 
controller  is  implemented  and  tested  on  a  group  of  three  flying  quad-rotor  robots 
(Figure  l-l(b)). 

6.  Implementation  of  Model  Learning  for  a  Herd  of  Cows  The  multi-robot  model 
is  used  for  modeling  the  herding  behavior  of  cows.  System  identification  tech¬ 
niques  are  used  to  tune  the  parameters  of  the  model  using  GPS  data  collected 
from  3-10  actual  cows  (Figure  l-l(c)). 


1.5  Organization 

This  thesis  begins  by  reviewing  an  existing  multi-robot  coverage  controller  that  uses 
the  geometric  notion  of  a  Voronoi  tessellation.  After  stating  and  proving  the  ba¬ 
sic  qualities  of  this  existing  controller,  we  propose  an  optimization  problem  which 
is  shown  to  incorporate  the  Voronoi  controller  as  a  special  limiting  case.  This  opti¬ 
mization  problem  is  shown  to  be  of  a  surprisingly  general  nature,  specializing  to  give 
controller  designs  for  a  number  of  different  multi-robot,  problems.  After  posing  the 
basic  optimization  problem,  we  consider  an  important  extension  to  the  case  where 
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Figure  1-2:  This  figure  shows  the  relation  of  the  main  optimization  problem  to  each 
of  our  three  case  studies. 

some  information  in  the  environment  is  lacking  and  must  be  learned  through  sensor 
measurements.  We  proceed  by  specializing  the  general  optimization  problem  to  three 
specific  applications.  Each  application  involves  formulating  the  correct  optimization 
problem,  deriving  a  gradient  controller,  verifying  stability  properties,  building  a  nu¬ 
merical  simulation,  and  finally,  implementing  the  controller  on  a  multi-robot  platform. 
The  relation  among  the  main  parts  of  the  thesis  is  shown  in  Figure  1-2.  Thus  the 
thesis  begins  from  a  theoretical  point  of  view  and  proceeds  towards  more  practical 
matters.  After  a  short  introduction,  each  chapter  has  an  itemized  summary  of  contri¬ 
butions  and  a  previous  work  section,  and  concludes  with  a  synopsis  of  the  important 
points  in  the  chapter. 

Chapter  2  gives  the  background  necessary  for  the  rest  of  the  thesis.  It  starts  by 
reviewing  the  previous  work  relevant  to  the  thesis  and  describing  the  contributions  of 
the  thesis  in  the  context  of  the  existing  literature.  It  then  defines  the  mathematical 
notation  that  is  used  throughout  the  thesis,  and  states  two  theorems  relating  to  the 
convergence  and  stability  of  gradient  systems  that  will  be  used  repeatedly  in  later 
chapters.  We  then  derive  a  well-known  Voronoi  based  controller  and  prove  its  conver¬ 
gence  properties.  The  material  in  this  chapter  is  not  a  novel  research  contribution. 

Chapter  3  provides  the  main  theoretical  foundation  of  the  thesis  by  posing  a 
general  optimization  problem.  A  cost  function  is  formulated  using  the  motivating 
example  of  coverage  control.  Controllers  are  obtained  by  taking  the  gradient  of  the 
cost  function,  resulting  in  nonlinear  controllers  which  involve  computing  an  integral 


26 


of  some  quantity  over  the  environment.  It  is  shown  that  the  coverage  cost  function 
can  be  seen  in  a  more  general  light  as  being  relevant  to  consensus,  herding,  and 
other  multi-robot  control  tasks.  The  chapter  draws  attention  to  the  way  in  which 
sensor  measurements  are  combined  from  different  robots,  and  in  so  doing  poses  the 
idea  of  a  mixing  function.  It  is  shown  that  the  choice  of  mixing  function  roughly 
dictates  how  tightly  the  robots  aggregate.  A  parameterized  class  of  mixing  functions 
is  proposed  which  is  shown  to  unify  and  extend  several  different  multi-robot  control 
strategies,  including  ones  with  geometric  interpretations,  probabilistic  interpretations, 
and  potential  field  interpretations.  This  chapter  also  shows  that  the  Voronoi  based 
controller  can  be  approximated  arbitrarily  well  by  a  smooth  controller  that  does 
not  require  the  computation  of  a  Voronoi  tessellation.  The  chapter  concludes  by 
formally  delineating  two  classes  of  multi-robot  problems:  consensus  problems  and 
non-consensus  problems.  Coverage  control  is  shown  to  be  a  non-consensus  problem, 
which  therefore  requires  the  optimization  of  a  nonconvex  cost  function. 

Chapter  4  completes  the  theoretical  part  of  the  thesis  by  considering  how  to 
augment  the  multi-robot,  controllers  from  Chapter  3  to  include  learning.  Learning 
is  first  incorporated  into  the  standard  Voronoi  based  controller  from  Chapter  2.  We 
then  apply  the  learning  architecture  to  augment  the  more  general  class  of  gradient 
controllers  seen  in  Chapter  3.  We  frame  the  learning  algorithm  as  a  matter  of  adapting 
parameters  of  a  parametric  model  on-line  as  data  is  collected.  Additionally,  the 
algorithm  leverages  communication  among  neighboring  robots  to  facilitate  distributed 
learning.  A  consensus  method  propagates  the  learning  parameters  from  any  one  robot 
throughout  the  network  so  that  every  robot  learns  asymptotically  as  well  as  if  it  had 
global  information.  The  controller  and  learning  algorithm  are  then  analyzed  in  the 
same  mathematical  context  using  Lyapunov  stability  theory.  We  address  questions  of 
convergence  of  the  learning  algorithm  and  convergence  of  the  robots  positions  to  their 
goal  configuration.  Rates  of  convergence  and  conditions  for  asymptotically  perfect 
learning  performance  are  also  investigated  and  a  number  of  different  stable  learning 
algorithms  are  proposed. 

Chapter  5  uses  the  theory  from  both  Chapters  3  and  4  to  implement  a  Voronoi 
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based  controller  that  incorporates  learning.  The  controller  drives  a  group  of  robots  to 
spread  over  an  environment  while  aggregating  in  areas  of  high  sensory  interest.  The 
controller  learns  the  areas  of  interest  from  sensor  measurements  while  simultaneously 
driving  the  robots  to  minimize  a  cost  function  representing  the  surveillance  cost  of  the 
group.  The  algorithm  is  implemented  on  a  team  of  16  mobile  robots.  In  experiments, 
the  robots  repeatably  and  effectively  learned  the  distribution  of  sensory  interest  while 
covering  the  environment. 

Chapter  6  uses  the  theory  presented  in  Chapter  3  to  design  a  controller  to  de¬ 
ploy  hovering  robots  with  downward  facing  cameras  to  collectively  monitor  an  envi¬ 
ronment.  Information  per  pixel  is  proposed  as  a  general  optimization  criterion  for 
multi-camera  placement  problems.  This  metric  is  used  to  derive  a  specific  cost  func¬ 
tion  for  multiple  downward  facing  cameras  mounted  on  hovering  robot  platforms.  A 
controller  is  derived  by  taking  the  negative  gradient  of  this  cost  function,  and  conver¬ 
gence  is  proved  with  the  theorems  from  Chapter  2.  The  controller  is  implemented  on 
three  flying  quad-rotor  robots.  Results  of  the  robot  experiments  are  presented  and 
compared  with  simulation  results. 

In  Chapter  7  the  multi-robot  model  is  adapted  to  model  the  behavior  of  cows  in 
a  herd.  Least  Squares  system  identification  is  applied  to  tune  the  parameters  of  the 
model  to  fit  the  behavior  of  an  actual  herd  of  cows.  The  herd  model  describes  the 
interaction  between  agents  using  a  parameterized  nonlinear  force  law  and  captures 
the  animals’  preference  for  certain  paths  over  the  environment  as  a  parameterized 
vector-field.  To  demonstrate  the  method,  GPS  data  collected  from  three  cows  in  one 
instance,  and  ten  cows  in  another  are  used  to  tune  the  model  parameters.  Conclusions, 
lessons  learned,  and  future  work  are  given  in  Chapter  8. 
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Chapter  2 


Background 

2.1  Introduction 

This  chapter  accomplishes  four  goals:  1)  it  situates  our  work  in  the  research  literature 
of  robotics  and  controls,  2)  it  formulates  the  main  multi-robot  system  model  that  is 
referred  to  repeatedly  in  the  thesis,  3)  it  proves  two  theorems  about  the  convergence 
and  stability  of  gradient  systems  that  are  used  repeatedly  in  the  thesis,  and  4)  it 
develops  in  detail  a  previously  existing  multi-robot  coverage  controller  that  will  serve 
as  a  baseline  for  comparison  and  elaboration  throughout  the  thesis.  The  material  in 
this  chapter  does  not  constitute  novel  research  contributions. 

2.2  Previous  Work 

As  described  in  Chapter  1,  most  research  in  the  control  of  multi-robot  systems  has 
focused  on  the  following  prototypical  problems. 

1.  Coverage-  the  deployment  of  a  group  of  agents  over  an  environment,  or,  more 
generally,  the  dispersion  of  agents’  states  over  a  state  space. 

2.  Consensus-  the  convergence  of  a  group  of  agents  to  a  common  point,  or,  more 
generally,  convergence  of  the  states  of  a  group  of  agents  to  a  common  final  vec¬ 
tor  or  manifold.  Consensus  can  be  seen  as  in  opposition  to  coverage,  since  in 
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the  former  agents  come  together  and  in  the  later  they  spread  apart.  This  phe¬ 
nomenon  is  often  called  by  other  names  including  rendezvous,  agreement,  and 
flocking,  and  is  closely  related  to  gossip  algorithms  in  distributed  computation, 
and  oscillator  synchronization. 

3.  Herding -  the  aggregation  of  a  group  of  agents  in  such  a  way  that  they  do  not 
get  too  far  from,  or  too  close  to  one  another.  This  can  be  seen  as  a  composition 
of  coverage  and  consensus. 

In  this  thesis  we  show  that  these  three  problem  arise  from  the  same  basic  optimization 
problem  with  different  parameters  or  weightings  on  different  terms.  These  three  areas 
have  emerged  in  the  research  literature  as  separate  problems,  however.  We  will  look 
at  the  relevant  literature  foe  each  of  tehse  area  and  situate  this  thesis  with  respect  to 
it. 

The  kind  of  coverage  control  considered  in  this  thesis  owes  its  beginning  to  the 
optimization  formulation  introduced  in  [26]  which  uses  the  geometrical  notion  of  a 
Voronoi  partition  to  divide  up  the  environment  among  the  robots.  This  work  itself 
adapted  concepts  from  locational  optimization  [30,114],  which  is  the  study  of  opti¬ 
mally  placing  industrial  facilities.  This,  in  turn,  derives  from  a  classical  problem  of 
finding  geometric  median  points,  which  has  been  attributed  to  Fermat.  The  coverage 
controller  in  [26]  drives  the  robots  to  reach  a  centroidal  Voronoi  configuration  [70]. 
There  are  a  number  of  other  notions  of  coverage  including  the  notion  of  painting  a 
sensor  footprint  over  an  environment  as  in  [17,20,56],  or  of  introducing  sensors  se¬ 
quentially  in  a  centralized  way  to  optimize  a  probabilistic  quantity,  as  in  [52,53].  This 
thesis  adopts  the  locational  optimization  approach  for  its  interesting  possibilities  for 
analysis,  its  connection  to  distributed  optimization,  and  the  resulting  potential  for  in¬ 
tegrating  it  with  graph  theory  (to  model  communication  networks)  and  learning.  The 
basic  idea  introduced  in  [26]  has  been  extended  and  elaborated  upon  considerably. 
For  example,  [87]  used  a  deterministic  annealing  technique  to  improve  final  robot 
configurations,  [25]  extended  the  controller  to  robots  with  finite  sensor  footprints  and 
other  realistic  complications,  [77]  extended  the  controller  to  heterogeneous  groups  of 
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robots  and  nonconvex  environments,  [75]  generalized  the  Voronoi  partition  by  intro¬ 
ducing  the  Power  Diagram  to  achieve  equitable  mass  partitions,  and  [78]  treated  the 
problem  of  coverage  in  time  varying  environments.  Probablistic  scenarios  that  use  the 
same  kind  of  controller  have  also  been  considered  in,  for  example  [57],  [4],  and  [76]. 
The  article  [63]  and  the  book  [14]  provide  an  excellent  consolidation  of  much  of  this 
research. 

One  common  thread  in  all  of  these  coverage  control  works  is  that  the  distribution 
of  sensory  information  in  the  environment  is  required  to  be  known  a  priori  by  all 
robots.  This  a  priori  requirement  was  first  relaxed  in  [95]  by  introducing  a  controller 
with  a  simple  memoryless  approximation  from  sensor  measurements.  The  controller 
was  demonstrated  in  hardware  experiments,  though  a  stability  proof  was  not  found. 
One  of  the  contributions  in  this  thesis  is  to  incorporate  learning  to  enable  optimal 
coverage  of  an  unfamiliar  environment.  We  formulate  the  problem  of  learning  about 
an  unknown  environment  as  an  adaptive  control  problem.  Adaptive  control  is  usually 
applied  to  the  control  of  dynamical  systems  that  are  unknown,  or  only  partially 
known.  Some  of  the  standard  text  books  on  adaptive  control  are  [67,89, 103]  and  a 
well-known  paper  that  deals  writh  an  adaptive  control  architecture  that  accommodates 
more  general  function  approximation  techniques  is  [88].  This  thesis  leverages  the 
proof  techniques  used  in  this  body  of  work  to  obtain  Lyapunov  stability  results  for 
coverage  controllers  that  incorporate  learning,  not  of  dynamics,  but  of  some  aspect 
of  the  environment  itself. 

The  second  multi-robot  control  problem  relevant  to  the  work  in  this  thesis  is  con¬ 
sensus.  Consensus  phenomena  have  been  studied  in  many  fields,  and  appear  ubiqui¬ 
tously  in  biological  systems  of  all  scales..  However,  they  have  only  recently  yielded 
to  rigorous  mathematical  treatment;  first  in  the  distributed  and  parallel  computing 
community  [9, 10, 107, 108]  in  discrete  time,  and  more  recently  in  the  controls  commu¬ 
nity  in  continuous  time.  One  of  the  foundational  works  on  consensus  in  the  controls 
community  is  [46],  which  analyzes  the  well-known  flocking  model  presented  in  [109], 
and  presents  general  consensus  conditions  for  multi-agent  systems  with  switching 
sets  of  neighbors.  Another  foundational  work  of  this  genre  is  [71]  which  deals  with 
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directed  communication  networks,  switching  sets  of  neighbors,  and  communication 
time-delays.  Other  works  in  this  area  include  [111,112]  which  uses  contraction  theory 
to  analyze  synchronization  of  oscillators  (which  can  be  seen  as  consensus  on  the  phase 
of  multiple  periodic  systems),  [13]  which  looked  at  asynchronous  communication  and 
unbounded  communication  delays,  [27]  investigated  a  model  in  which  the  influence  of 
one  agent  over  another  decays  as  a  function  of  the  distance  between  them,  drawing 
parallels  with  the  emergence  of  language  in  isolated  human  populations,  and  [66]  con¬ 
sidered  agents  that  communicate  only  limited  state  information  with  agents  within 
their  line-of-sight  to  model  flocking  behavior  in  birds. 

In  this  thesis,  consensus  plays  a  key  role  in  distributed  learning.  Unknown  factors 
in  the  environment,  such  as  where  the  most  informative  data  can  be  found,  is  learned 
on  line  in  a  distributed  way  by  propagating  sensor  measurements  gathered  by  each 
robot  around  the  network.  The  robots  essentially  reach  a  consensus  on  the  function 
they  are  trying  to  learn,  each  robot  getting  the  benefit  of  the  senor  measurements  of 
all  the  other  robots  in  the  network.  This  is  similar  to  distributed  filtering  techniques 
that  have  recently  been  introduced,  for  example  in  [62,117],  though  in  contrast  to 
those  works,  the  controllers  in  this  thesis  are  concerned  with  maintaining  provable 
stability  of  the  combined  learning  and  control  system. 

Herding,  the  third  multi-robot  control  problem  relevant  to  this  thesis,  has  been 
studied  under  many  variations,  and  is  usually  seen  as  a  modification  to  the  consensus 
problem.  Herding  is  often  carried  out  using  potential  field  formulation  in  which  agents 
attract  each  other  if  they  are  too  far  and  repel  each  other  if  they  are  too  close,  as 
in  [35] .  Situations  in  which  agents  only  effect  each  other  when  they  are  within  a  certain 
distance  are  treated  in  [104]  and  extensions  which  attempt  to  maintain  connectivity 
of  the  underlying  communication  graph  in  these  situations  are  considered  in  [29, 116]. 
These  systems  require  nonsmooth  analysis  techniques,  for  example  those  described 
in  [24,86].  Results  pertaining  to  the  distributed  computation  of  graph  connectivity  are 
also  an  important  part  of  the  latest  work  on  herding  and  consensus,  as  in  [115].  Graph 
theory  more  generally  has  long  history  in  pure  mathematics,  and  a  well  known  text  on 
graph  theory  is  [37] .  Our  application  in  Chapter  7  uses  system  identification  to  tune 
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the  parameters  of  a  herding  model.  Both  systems  identification  and  herd  models  have 
a  rich  literature  separately,  though  there  appears  to  little  prior  work  in  combining  the 
two.  One  exception  is  the  work  of  Correll  et  al.  [22,23]  which  uses  system  identification 
to  learn  the  parameters  of  a  rate  equation  describing  the  behavior  of  a  multi-robot, 
system  used  for  turbine  blade  inspection.  Many  of  the  results  in  this  thesis  are  based 
on  results  that  have  been  published  in  [91,93,94,96-100]. 


2.3  Mathematical  Preliminaries 

This  section  gives  the  mathematical  notation  and  definitions  used  throughout  the 
thesis  and  states  two  basic  theorems  that  will  come  in  handy  in  later  chapters.  We 
will  use  Rd  to  denote  the  d-dimensional  Euclidean  space,  and  R>0  and  R>o  to  be 
the  non-negative  and  strictly  positive  quadrants  of  d-dimensional  Euclidean  space, 
respectively.  We  mean  the  symbols  >,  <,  >  and  <  to  apply  element-wise  for  vectors. 
An  open  interval  on  the  real  line  with  end  points  x  and  y  is  denoted  (x,y)  and  the 
closed  interval  [x,  y],  with  [x,y]  and  (x,y\  being  the  intervals  closed  on  the  left,  open 
and  the  right,  and  open  on  the  left,  closed  on  the  right,  respectively.  The  symbol 
d-  will  be  used  to  refer  to  the  boundary  of  a  set  and  d  ■  /d-  to  refer  to  a  partial 
derivative.  Real  vectors  will  not  be  differentiated  from  scalars  with  a  bold  font,  but 
wherever  it  is  not  obvious  we  will  explicitly  state  v  E  Rd  or  s  E  R  for  a  vector  v  and 
scalar  s.  The  vector  of  ones  is  denoted  1  and  the  nxn  identity  matrix  is  denoted 
The  derivative  of  a  function  v  with  respect  to  time  will  be  denoted  either  with  the 
conventional  dvjdt ,  or  with  the  shorter  v  notation  where  convenient.  The  £2  norm  is 
denoted  ||  •  ||  and  ||  •  ||p  gives  the  F  norm.  A  function  /  :  fi  i->  M  is  called  Lipschit.z  on 
f]  if  there  exists  a  constant  L  such  that  |/(x2)  —  f{x i)|  <  0\\x2  —  aq||,  for  all  points 
xi,x2  G  fi.  The  a  function  is  called  locally  Lipschit.z  if,  for  any  point  igO,  there 
exists  a  ball  B(x)  centered  at  x  such  that  the  function  is  Lipschitz  on  B(x)  with  a 
constant  j3(x )  that  depends  upon  the  point  x.  A  sufficient  condition  for  a  differential 
equation  x  =  f(x)  to  have  a  unique  solution  for  a  given  initial  condition  is  that  the 
function  /  is  locally  Lipschitz  [14,43,49]. 
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2.3.1  Convex  Optimization 


We  now  state  some  basic  definitions  from  convex  optimization  which  can  be  found 
in  any  standard  text  on  the  topic,  for  example  [8].  A  set  Cl  C  Rn  is  called  convex  if, 
for  any  two  points  in  Q,  all  points  along  the  line  segment  joining  them  are  also  in  Q. 
Formally, 


ax  +  (1  —  a)y  £  Q,  Vx,y£Q  and  Va  €  [0, 1].  (2.1) 

An  important  consequence  of  the  convexity  of  0  is  that  any  convex  combination  of 
points  in  Q  is  also  in  f 2 .  A  convex  combination  of  m  points  Xi  £  fi  is  one  of  the  form 

m  m 

x  =  ^  aiXi  where  ^  ck*  —  1  and  aij  >  0  Vi.  (2.2) 

i= 1  *= 1 

A  function  /  :  i— >  R  is  called  convex  if 

f(ax  +  (l-a)y)<af(x)  +  (l-a)f(y)  Vx,y£fi  and  Va  €  [0, 1].  (2.3) 

This  is  equivalent  to  saying  that  the  set  of  all  points  lying  on  or  above  the  function  / 
is  a  convex  set  (this  set  is  known  as  the  epigraph  of  /).  A  function  is  called  strictly 
convex  if  the  ’<’  can  be  replaced  with  a  ’<’  in  the  above  relation.  Also,  with  regards 
to  optimization,  we  will  use  the  word  minimum  to  mean  minimum  or  infimum  if  no 
minimum  exists. 

We  now  state  a  theorem  concerning  the  convexity  of  the  set  of  minima  of  a  con¬ 
vex  function.  The  theorem  follows  from  Weierstrass’  Theorem  and  some  well-known 
properties  of  convex  functions. 

Theorem  2.1  (Minima  of  Convex  Functions)  For  a  continuous,  convex  func¬ 
tion  /  :  fl  h  R;  where  the  domain  Q,  C  R”  is  convex,  if  any  of  the  following  are 
true: 

1.  Q  is  bounded 
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2.  There  exists  a  scalar  7  such  that  the  level  set  {x  €  Li  |  f(x)  <  7}  is  nonempty 
and  bounded 

3.  f  is  such  that  lim\][X\\^00f(x)  =  00 

then  the  set  of  global  minima  of  f  is  non-empty  and  convex. 

Proof  2.1  Please  refer  to  [8]. 


2.3.2  Graph  Laplacians 

An  undirected  graph1  Q  =  (1,  £)  is  defined  by  a  set  of  indexed  vertices  I  =  {1, . . . ,  n} 
and  a  set  of  edges  £  =  {eif ...,  en£},  where  et  =  {j,  k }  and  j,  k  el.  In  the  context  of 
our  application,  a  graph  is  induced  in  which  each  agent  is  identified  with  a  vertex,  and 
an  edge  exists  between  any  two  agents  that  are  in  communication  with  one  another. 
Consider  a  function  w  :  1  x  1  1— >  M>0  such  that  wl3  =  0  V{i,  j}  £  £  and  >  0 
€  £■  We  call  w§  a  weighting  over  the  graph  Q.  Next  consider  the  weighted 
graph  Laplacian  matrix  L,  whose  terms  are  given  by 


L(i,j)  = 


-Wij  for  i  ±  j 
E;  I  «  u  for  i  =  j. 


(2.4) 


A  graph  is  connected  if,  for  any  two  vertices,  there  exists  a  set  of  edges  that  defines 
a  path  between  them.  The  following  result  is  well  known  in  graph  theory  and  will  be 
useful  in  proving  properties  of  our  distributed,  on-line  learning  algorithm  in  Chapter 

4. 


Theorem  2.2  (Graph  Laplacians)  For  a  connected,  undirected  graph,  the  weighted 
graph  Laplacian  is  symmetric,  positive  semi- definite,  L  >  0,  and  L  has  exactly  one 
zero  eigenvalue,  with  the  associated  eigenvector  1  =  [1, . . . ,  1]T.  In  particular,  LI  = 
1 TL  =  0,  and  x1  Lx  >  0,  \/x  ^  cl,  cel. 

Proof  2.2  Please  refer  to  [37]. 

:We  will  only  be  dealing  with  undirected  graphs  in  this  thesis.  When  we  refer  to  a  graph  it  is 
assumed  to  be  undirected. 
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2.4  Multi-Robot  System  Model 


In  this  section  we  introduce  the  basic  multi-robot  system  model  that  will  be  used 
throughout  the  thesis  and  provide  a  condensed  review  of  gradient  systems  and  their 
convergence  and  stability  properties. 

Let  there  be  n  robots,  where  robot  i  has  a  position  pt  €  V  C  Rdp.  The  state 
space  for  a  single  robot  is  V  and  the  dv  is  the  dimension  of  the  state  space.  Consider 
a  vector  P  €  Vn  C  which  is  the  vector  obtained  by  stacking  all  of  the  robot 
positions  together,  P  =  [pj  ■  ■  -p^\T ■  We  will  refer  to  the  vector  P  as  the  configuration 
of  the  robots  since  a  single  point  P  in  the  high  dimensional  space  Vn  represents  the 
positions  of  all  the  n  robots  in  the  low  dimensional  space  V.  Now  consider  a  cost 
function  H  :  Vn  h->  R  that  represents  the  suitability  of  a  given  configuration  of 
robots  for  a  given  task.  We  will  alternately  write  H(P)  or  . .  .,pn)  in  referring 
to  a  specific  value  of  the  function  for  a  configuration  P.  Let  the  cost  function  be 
differentiable  everywhere  on  Vn  so  that  its  partial  derivative  with  respect  the  each 
robot’s  position  is  well-defined,  dH/dpi.  Also,  let  dH/dpi  be  locally  Lipschitz  on  Pn 
to  guarantee  the  existence  of  solutions  of  our  system. 

Let  the  robots  have  simple  integrator  dynamics 

Pi  =  ui}  (2.5) 


where  Ui  is  the  control  input  to  robot  i.  This  assumption  will  be  used  throughout  the 
thesis  unless  otherwise  explicitly  stated.  We  have  found  in  experiments  that  a  fast 
low  level  control  loop  is  sufficient  in  many  cases  to  approximate  integrator  dynamics. 
The  standard  form  for  the  controllers  used  in  this  thesis  is  given  by 


Ui  =  —k 


dH(P) 

dpi 


(2.6) 


where  k  €  R>o  is  a  positive  control  gain.  The  closed  loop  multi-robot  system  is  then 
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Figure  2-1:  This  figure  shows  the  relationship  between  the  motions  of  individual 
robots  in  their  environment  and  the  trajectory  of  a  single  point  in  a  high  dimensional 
configuration  space. 


represented  by  the  n  coupled  differential  equations 


Pi  =  -k 


dH(P ) 

dpi 


(2.7) 


Equation  (2.7)  is  the  subject  of  this  thesis.  We  will  refer  to  it  repeatedly  as  the 
“multi-robot  system,”  the  “closed-loop  dynamics,”  or  the  the  “gradient  system,”  and 
we  display  it  in  a  box  for  emphasis.  The  relationship  between  the  motion  of  the 
group  of  robots  in  their  environment  and  the  motion  of  a  single  point  in  the  high 
dimensional  configuration  space  is  shown  in  Figure  2-1. 


2.4.1  The  Induced  Graph 

The  multi-robot  system  (2.7)  induces  a  graph  in  the  following  way.  The  gradient 
component  dH/dpi  may  only  depend  upon  some  of  the  other  robots  in  the  network. 
Let  Mi  be  the  set  of  indices  of  the  other  robots  upon  which  the  gradient  component 
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dW/dpi  depends,  and  let  JV4  be  the  vector  of  the  positions  of  those  robots.  We  have 


Pi  = 


-k 


dn(PMi) 

dpi 


For  robot  i  to  compute  it’s  own  controller,  it  only  needs  information  from  the  robots 
Mt.  Therefore  a  graph  is  induced  in  which  each  robot  is  a  vertex,  and  there  is  an 
edge  between  any  two  robots  that  require  one  another’s  positions  to  compute  their 
gradient  component.  The  controller  is  then  said  to  be  distributed  over  this  graph. 
For  example,  in  the  next  section  we  will  describe  a  controller  that  is  distributed  over 
the  Delaunay  graph. 

Of  course  it  be  the  case  that  a  particular  hardware  platform  and  particular  envi¬ 
ronmental  conditions  render  a  controller  infeasible  given  the  communication  graph. 
In  this  case  an  approximation  must  be  used.  We  recommend  two  possible  strategies 
for  approximating  the  controller  on  a  given  communication  graph:  1)  each  robot  com¬ 
putes  its  controller  using  only  the  robots  with  which  it  is  in  communication,  and  2) 

Q- 

each  robot  maintains  an  estimate  of  the  positions  of  the  robots  in  JVj  and  uses  these 
estimates  to  compute  its  controller.  In  this  thesis  we  assume,  unless  otherwise  stated, 
that  the  communication  graph  is  sufficient  for  each  robot  to  compute  its  controller, 
since  our  emphasis  is  more  on  dynamical  properties  of  the  controllers  then  on  network 
properties. 


2.4.2  Properties  of  Gradient  Systems 


We  can  equivalently  express  the  n  coupled  equations  in  (2.7)  as  a  single  equation 
using  the  configuration  vector  P  as 


P  = 


(2.8) 


From  (2.8)  it  is  clear  that  our  multi-robot  system  is  a  gradient  system,  meaning  the 
right  hand  side  of  the  governing  differential  equation  is  proportional  to  the  negative 
gradient  of  the  scalar  valued  cost  function  7i.  Gradient  systems  have  particularly 
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simple  and  powerful  convergence  and  stability  properties,  the  most  important  of  which 
will  be  given  here. 

Theorem  2.3  (Global  Convergence  of  Gradient  Systems)  Let 

0  =  {P*  |  dH/dP  |p.=  0}  be  the  set  of  all  critical  points  of  hi.  If  TL  is  radially 
unbounded,  or  if  all  trajectories  of  the  system  are  bounded,  then  all  trajectories  of  the 
system  P  =  —kdH/dP  converge  asymptotically  to  Pi. 

Proof  2.3  The  theorem  follows  as  a  corollary  to  LaSalle’s  Invariance  Principle  [49, 
55, 103].  Let  TL  be  the  Lyapunov  function  candidate.  Then  TL  —  —k\\dTL/dP\\2  <  0, 
and  if  TL  is  radially  unbounded,  the  trajectories  of  the  system  are  bounded,  therefore 
by  LaSalle’s  Invariance  Principle  all  trajectories  converge  to  the  largest  invariant 
set  contained  in  Pi.  By  the  definition  of  the  dynamics,  Pi  itself  is  an  invariant  set, 
therefore  all  trajectories  converge  to  Q. 

Remark  2.1  This  result  does  not  necessarily  imply  that  the  trajectories  converge  to  a 
single  point  in  ft.  However,  this  is  true  if  Pi  is  a  set  of  isolated  points.  Furthermore,  if 
the  system  ever  reaches  a  point,  P*  €  0,  it  will  stay  at  that  point  for  all  time,  whether 
or  not  it  is  an  isolated  critical  point,  since  P  =  0  Vf  >  0  at  such  a  point. 

The  following  useful  result  pertains  to  the  local  stability  of  critical  points  of  Tt. 

Theorem  2.4  (Local  Stability  of  Equilibria  of  Gradient  Systems)  Let  P*  be 

a  critical  point  of  TL.  Then  P*  is  a  locally  asymptotically  stable  equilibrium  of  the 
gradient  system  P  =  —kdTi/dP  if  and  only  if  P*  is  an  isolated  minimum  ofTC. 

Proof  2.4  Please  see  [43]  Chapter  9,  Section  4,  corollary  to  Theorem  1. 

Remark  2.2  Theorem,  2.3  is  concerned  with  all  critical  points  ofH — maxima,  min¬ 
ima,  and  saddle  points.  However,  it  is  intuitively  clear  that  the  system  ought  to  prefer 
minim, a.  This  intuition  is  made  precise  in  Theorem  2.4 ■  There  are  initial  conditions 
for  which  the  system  will  converge  to  a  saddle  point,  or  a  maximum,  but  these  critical 
points  are  not  locally  stable.  That,  is,  a  perturbation  will  cause  the  system  to  leave 
the  critical  point,.  Minima,  on  the  other  hand,  are  locally  stable.  They  are  robust  to 
pert/urbations. 
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We  will  use  Theorems  2.3  and  2.4  to  prove  convergence  and  stability  for  many 
particular  controllers  throughout  the  thesis. 

2.5  Voronoi  Coverage  Control 

Cortes  et  al.  [26]  proposed  a  controller  for  deploying  a  group  of  robots  over  an  environ¬ 
ment  to  provide  sensor  coverage  of  the  environment.  The  controller  uses  a  Voronoi 
tessellation  to  divide  up  the  environment  among  the  robots,  each  robot  being  in 
charge  of  sensing  over  its  Voronoi  cell.  In  this  thesis  we  use  this  controller,  which 
we  call  the  Voronoi  controller,  as  a  baseline  strategy.  We  will  demonstrate  that  the 
Voronoi  controller  is  a  special  case  of  a  more  general  class  of  coverage  controllers.  It  is 
necessary  therefore  to  provide  a  motivation  and  derivation  for  the  Voronoi  controller 
before  proceeding  to  our  contributions  in  the  following  chapters.  In  this  section  we 
review  the  Voronoi  controller  and  prove  some  crucial  details  about  it’s  derivation. 

2.5.1  Voronoi  Cost  Function 

The  cost  function  upon  which  the  Voronoi  controller  is  based  is  adapted  from  the  field 
of  locational  optimization,  which  addresses  how  to  optimally  place  retail  of  industrial 
facilities  [30, 114].  The  canonical  example  is  placing  retail  facilities  to  minimize  the 
aggregate  travel  time  of  a  population  of  customers. 

Consider  a  multi  robot  system  as  in  Section  2.7  in  which  the  robots  are  positioned 
in  a  convex  bounded  environment  Q.  An  arbitrary  point  in  Q  is  denoted  q  and  the 
robot  positions  pi  €  Q  =  V.  Define  the  sensory  function,  <f>  :  Q  K>0,  and  let 
it  be  known  to  all  of  the  robots.  The  sensory  function  should  be  thought  of  as  a 
weighting  of  importance  over  Q.  We  want  to  have  many  robots  where  0(g)  is  large, 
and  few  where  it  is  small.  For  now  we  will  assume  that  the  function  0(g)  is  known  by 
the  robots  in  the  network.  In  Chapter  4  we  will  relax  this  requirement  with  on-line 
learning. 

The  precise  definition  of  the  sensory  function  depends  on  the  desired  application. 
In  an  application  in  which  a  team  of  robots  are  used  to  clean  up  an  oil  spill,  an 
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appropriate  choice  for  the  sensory  function  would  be  the  concentration  of  the  oil 
as  a  function  of  position  in  the  environment.  For  a  human  surveillance  application 
in  which  robots  use  audio  sensors,  <p(q)  may  be  chosen  to  be  the  intensity  of  the 
frequency  range  corresponding  to  the  human  voice.  The  sensory  function  <p(q)  may 
also  be  the  probability  density  function  relating  to  the  occurrence  of  events  in  Q. 

The  robots  are  equipped  with  sensors  and  the  quality  of  sensing  is  assumed  to 
decrease  according  to  a  differentiable,  strictly  increasing  function  /  :  R>0  — >  R. 
Specifically,  f{\\q  -  p,;||)  describes  how  costly  is  the  measurement  of  the  information 
at  q  by  a  sensor  at  p,;.  This  form  of  f(x)  is  physically  appealing  since  it  is  reasonable 
that  sensing  will  become  more  unreliable  farther  from  the  sensor.  Then  the  standard 
Voronoi  cost  function  can  be  written 


H(P)=  [  pin  f(\\q  —  Pi\\)4>{q)  dq.  (2.9) 

JQie{ 

The  minimum  over  sensors  reflects  the  fact  that  a  point  q  should  be  the  responsibility 
of  the  sensor  that  has  the  best  sensing  performance  at  q.  The  problem  of  covering 
the  environment  Q  is  now  formulated  as  moving  the  robots  to  a  configuration  P*  to 
minimize  ft. 

The  cost  function  (2.9)  has  been  used  in  a  wide  variety  of  applications  including 
data  compression,  allocation  of  resources,  and  placement  of  industrial  and  commercial 
facilities.  In  the  following  section  we  consider  computing  the  gradient  of  (2.9)  in  order 
to  design  a  gradient  descent  controller. 

Consider  the  minimization  of  (2.9) 


minft(P)  =  min 

P  V  P 


Pi\\)4>{q)  dq. 


The  minimum  inside  the  integral  induces  a  partition  of  Q  into  non-overlapping  cells, 
Vi,  to  give 


nun  ft  (P) 


n  * 

mjnE  / /(ii« 

;.=1  J  Vi 


■  Pi\\)4>(q)  dq , 


(2.10) 
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where  V,  =  {q  €  Q  |  f(\\q~Pi\\  <  f(\\q~Pj\\)  Y j  ^  *}•  Since  /  is  strictly  increasing, 
this  is  equivalent  to 


Vi  =  {q  €  Q  \  \\q  -  pi\\  <  \\q  -  pj\\  Vj^i}.  (2.11) 

The  region  Vt  is  the  Voronoi  cell  of  p^.  The  collection  of  Voronoi  cells  is  called  the 
Voronoi  tessellation 2  [70]  of  Q,  an  example  of  which  is  shown  in  Figure  2-2 (a). 

2.5.2  Computations  with  Voronoi  Tessellations 

We  now  define  a  number  of  quantities  relating  to  Voronoi  cells.  Let  <9V*  and  dQ  be 
the  boundary  of  Vt  and  Q,  respectively.  By  qdv,(P)  we  mean  a  point  q  €  dVi,  and 
TiQVi  is  the  outward  facing  unit  normal  of  d V*.  Given  a  robot  i,  we  define  A/)  as  the 
index  set  of  robots  that  share  Voronoi  boundaries  with  V%,  V  =  {j  \  Vi  PI  Vj  ^  0}. 
We  denote  the  set  of  points  on  the  Voronoi  boundary  shared  by  agents  i  and  j  as 
kj  =  Vi fi  Vj  as  shown  in  Fig.  2-2.  Then  q^.  {puPj)  is  a  point  on  that  shared  boundary, 
and  ni y  is  the  unit  normal  of  kj  from  pt  to  pj.  By  the  definition  of  the  Voronoi  cell 
(2.11),  we  know  the  following  facts: 


dVi  =  (U jstfikj)  U  {dVi  n  dQ) , 

kj  ~  Iji  i 
nhj  =  ~nl ji¬ 
lt  can  also  be  proved  that  the  shared  boundaries  kj  are  hyperplanes,  and  the  Voronoi 
cells  are  convex. 

The  following  lemma  states  an  important  fact  about  the  cost  function  (2.10). 

Lemma  2.1  (Cancellation  of  Boundary  Terms)  The  gradient  ofH{P)  is  given 
by 

I SvMm~Ptmq)iq-  (215) 

2It  is  convenient  for  us  to  use  <  in  the  definition  of  Voronoi  cell,  rather  than  the  more  common 
definition  with  < 


(2.12) 

(2.13) 

(2.14) 
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(a)  Voronoi  Tessellation  (b)  Detail  of  Voronoi  Cell  Boundary- 

Figure  2-2:  An  example  of  a  Voronoi  Tessellation  is  shown  on  the  left  and  the  quanti¬ 
ties  and  constraints  associated  with  the  Voronoi  boundary  shared  between  neighboring 
agents  is  shown  on  the  right. 


Proof  2.5  Differentiating  under  the  integral  sign  [32],  we  have 

7T=  [  jr-f(h-Pi\\)<K<l)d<l 

dpi  JVi  °Pi 

+  f  mQ-p,\\)m^P-r,av,d<! 

JdVi  °Pi 

+  £  f 

jesSt  Jla 

where  and  are  d  x  d  matrices.  Using  (2.12),  (2.13),  and  (2.14) 

JT=[  Jf-fi lk» Pi WW'fidQ 

dpi  JVi  dpi 

+  f  flUPim<l)^D.naVldg 

JdV.ndQ  °Pi 

+  S  /  (/dig  ~p<ii  ~  /dig  d<i ■ 

i&f,  dl*i  Lri>% 

By  definition  of  kj,  II?  “Pill  =  ll<?“Pjll  W  G  Uj>  so  the  last  sum  vanishes.  Since 
points  on  the  boundary  of  the  environment  do  not  change  position  as  a  function  of  p ,, 
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we  have 


dqavj 

dpi 


=  0 


Vg  e  dVi  n  8Q 


and  the  second  term  vanishes.  □ 


Remark  2.3  Lemma  2.1  is  surprising  for  its  simplicity.  One  might  expect  the  gra¬ 
dient  of  H  to  be  more  complicated  due  to  the  fact  that  the  Voronoi  tessellation  which 
defines  the  boundary  of  the  integrals  is  a  function  of  the  robots’  positions.  Essentially, 
all  of  the  complicated  boundary  terms  cancel  out  leaving  a  simple  expression  for  the 
gradient. 

Remark  2.4  For  an  agent  to  compute  its  gradient  component  (2.15)  it  must  be  able 
to  compute  its  Voronoi  cell,  which  means  it  must  know  the  positions  of  its  Voronoi 
neighbors.  It  is  a  common  assumption  in  the  literature  [26, 11, 81]  that  a  robot  knows 
the  positions  of  its  Voronoi  neighbors  either  by  sensing  or  communication.  Unfortu¬ 
nately,  this  assumption  presents  a  practical  conundrum:  one  does  not  know  beforehand 
how  far  away  the  farthest  Voronoi  neighbor  will  be,  thus  this  assumption  cannot  be 
translated  into  a  communication  range  constraint  ( aside  from  the  conservative  require¬ 
ment  for  each  robot  to  have  a  communication  range  as  large  as  the  diameter  of  Q ). 
In  practice,  only  Voronoi  neighbors  within  a  certain  distance  will  be  in  communica¬ 
tion,  in  which  case  results  can  be  derived,  though  with  considerable  complication  [25]. 
We  will  take  this  assumption  as  implicit.  Indeed,  our  experimental  and  numerical 
results  suggest  that  performance  degrades  gracefully  with  decreasing  communication 
range  among  robots. 


We  now  restrict  ourselves  to  the  case  in  which  f(x)  —  l/2x2.  This  form  of  f{x) 
is  appropriate  for  light-based  sensors,  for  example  cameras,  infrared  detectors,  or 
laser  scanners,  since  the  intensity  from  a  light  source  drops  of  with  the  square  of  the 
distance  to  the  source.  The  cost  function  becomes 


l/2\\<l-Pi\\2<Kq)dq. 


(2.16) 
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Define  the  mass,  first  moment,  and  centroid  of  the  Voronoi  cell  V]  as 

My  =  fVi  4>{q)  dq ,  LVi  =  fv,  q<p(q)  dq ,  and  Cy  =  LVi/MVi,  (2.17) 

respectively.  Note  that  f(x)  strictly  increasing  and  4>(q)  strictly  positive  imply  both 
M y  >  0  V  Vj  7^  0  and  Cy  €E  V)\<9V)  ( C\/  is  in  the  interior  of  V,).  Thus  My  and  Cy 
have  properties  intrinsic  to  physical  masses  and  centroids. 

Using  this  notation,  dTt/dpi  simplifies  to 

=  -  /  (q-  Pi)<t>(q)dq  =  -Myi(Cyi  -  Pi) .  (2.18) 

°Pi  JV-i 

Critical  points  of  H  (configurations  where  the  gradient  is  zero)  are  those  in  which 
every  agent  is  at  the  centroid  of  its  Voronoi  cell,  pt  =  Cyi  Mi.  The  resulting  partition 
of  the  environment  is  commonly  called  a  Centroidal  Voronoi  Configuration  (CVC). 
It  is  known  that  CVC’s  can  correspond  to  local  maxima,  minima,  or  saddle  points 
of  7i.  Finding  global  minima  of  Ti  is  known  to  be  difficult  (NP-hard  for  a  given 
discretization  of  Q)  even  in  the  fully  centralized  case.  Next,  we  present  a  distributed 
control  law  that  is  used  in  [26]  to  make  the  robots  converge  to  a  CVC. 


2.5.3  Voronoi  Controller 

A  classic  discrete-time  method  to  compute  CVC’s  is  Lloyd’s  algorithm  [60].  In  each 
iteration  this  method  executes  three  steps:  (i)  compute  the  Voronoi  regions;  (ii) 
compute  the  centroids;  (iii)  move  each  p,  to  its  corresponding  centroid. 

In  [26]  continuous-time  version  of  this  approach  is  proposed  for  robots  with  simple 
integrator  dynamics  as  in  (2.5).  The  control  law  is  given  by 


Ui  =  k(CVj  —  p^  (2.19) 

and  it  guarantees  that  the  system  converges  to  a  CVC.  This  control  law  gives  a 
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variation  on  the  multi-robot  system  (2.7),  with 


k  m 
MVi  dpi ' 


(2.20) 


The  proof  of  convergence  is  similar  to  that  of  Theorem  2.3.  Firstly,  we  must 
restrict  the  state  space  <S  C  Vn  to  those  configurations  for  which  p,  Y  pj  Vi  Y  j ,  that 
is  S  =  {P  —  \pf  •  •  -  p^Y  |  pi  Y  Pj  Vi  Y  j}-  It  is  known  from  [26]  that  <S  is  invariant 
under  the  control  law  (2.19)  (this  relies  upon  Q  being  convex).  Now  define  the  set 
of  CVCs  fi  =  {P  |  Pi  =  Cvi  Vi}.  Notice  from  dTL/dpi  that  is  precisely  the  set  of 
critical  points  of  H  over  V.  This  leads  to  the  following  convergence  result,  which  is 
similar  to  Theorem  2.3,  but  has  a  slightly  more  subtle  proof. 


Theorem  2.5  (Voronoi  Convergence)  The  system  with  dynamics pt  =  k(CVi—Pi) 
i  =  1, 2, . . . ,  n  converges  asymptotically  to  the  set  of  centroidal  Voronoi  configurations 

n. 


Proof  2.6  The  theorem  follows  from  LaSalle’s  Invariance  Principle  [49, 55,103],  By 
assumption,  the  domain  of  the  robots  Q  is  bounded,  therefore  V  is  bounded.  V  is 
also  invariant  under  the  control  law  (2.19),  therefore  all  trajectories  are  bounded. 
The  control  law  is  locally  Lipschitz  (this  can  be  verified  directly  from  the  definition 
of  locally  Lipschitz).  Computing  H  along  the  trajectories  of  the  system  we  find  H  = 
—  Mvi\\Cvi  —  Pi||2  <  0,  so  all  the  conditions  of  LaSalle’s  Invariance  Principle  are 
satisfied,  and  the  system  converges  to  the  largest  invariant  set  in  Q,  but  0  itself  is 
invariant,  so  the  system  converges  to  0. 

Remark  2.5  Theorem  2. 4  applies  directly  to  this  controller  as  well.  Unfortunately,  it 
is  difficult  to  determine  for  this  cost  function  which,  if  any,  critical  points  are  isolated 
minima.  Indeed,  this  appears  to  be  strongly  dependent  upon  the  specific  geometry  of 
Q  and  <f(q) .  For  example,  if  Q  Ci2  is  a  circular  disc  and  <p(q )  is  constant,  then  the 
set  of  critical  points  can  be  shown  to  be  a  closed  orbit  in  M2n  ( this  is  not  difficult  to 
visualize  considering  the  symmetry  of  the  problem). 
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2.6  Synopsis 


In  this  chapter  we  laid  the  groundwork  for  the  remainder  of  the  thesis.  Firstly,  we 
situated  our  work  in  the  research  literature.  Next,  we  fixed  the  mathematical  notation 
that  will  be  used  throughout  the  thesis.  We  then  formulated  the  main  multi-robot 
system  model  to  be  used  throughout  the  thesis  and  stated  two  foundational  theorems, 
one  concerning  the  convergence  of  gradient  systems  to  the  set  of  critical  points  of  their 
associated  cost  function,  and  one  stating  that  only  isolated  local  minima  are  stable. 
Finally,  we  described  in  detail  an  existing  multi-robot  coverage  control  strategy  that 
relies  upon  a  Voronoi  tessellation,  and  we  proved  the  convergence  of  that  strategy  to 
a  centroidal  Voronoi  configuration. 

The  Voronoi  controller  (2.19)  serves  as  a  baseline  throughout  the  thesis.  We  show 
in  Chapter  3  that  it  is,  in  fact,  a  limiting  instance  of  a  more  general  class  of  coverage 
controllers  that  model  a  broad  range  of  sensing  and  actuation  modalities.  We  add 
stable  on-line  learning  to  learn  the  sensory  function  (p(q )  in  Chapter  4,  and  we  look 
at  three  detailed  case  studies  with  experiments  in  Chapters  5,  6,  and  7. 
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Chapter  3 


Generalized  Coverage  Control 

3.1  Introduction 

In  this  chapter  we  introduce  two  of  the  main  theoretical  contributions  of  the  thesis. 
Firstly  we  propose  a  general  multi-robot  cost  function  which  can  be  specialized  to 
a  number  of  different  multi-robot  tasks.  The  negative  gradient  of  the  cost  function 
is  used  as  a  multi-robot  controller  as  in  equation  (2.7).  The  cost  function  has  three 
main  components:  a  sensory  function,  a  sensor  cost,  and  a  mixing  function.  We  put 
special  emphasis  on  the  role  of  the  mixing  function,  and  show  that  a  particular  family 
of  mixing  functions  act  as  a  smooth  approximation  to  the  Voronoi  controller  seen  in 
Chapter  2.  The  Voronoi  controller  is  recovered  exactly  in  the  limit  as  a  parameter 
goes  to  —  oo,  while  a  new  probabilistic  interpretation  is  achieved  with  a  parameter 
value  of  —1.  Herding  and  consensus  controllers  are  also  derived  with  different  values 
of  the  mixing  function,  sensor  cost,  and  sensory  function. 

Secondly,  we  prove  a  result  that  justifies  the  fact  that  our  controllers  are  proven 
to  converge  only  to  local  minima  of  the  cost  function,  rather  than  global  minima. 
It  is  known  that  gradient  descent  controllers  can  be  proven  to  find  global  minima  of 
convex  functions,  but  in  general  they  can  only  be  proven  to  find  local  minima  for 
nonconvex  functions.  It  is  tempting,  therefore,  to  try  to  find  convex  cost  functions 
for  multi-robot  problems.  We  prove  in  this  chapter  that  any  multi-robot  task  other 
than  consensus  must  be  characterized  by  a  nonconvex  cost  function,  and  therefore, 
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in  general,  one  cannot  expect  better  than  convergence  to  a  local  minima.  This  does 
not  close  the  door  on  the  problem,  but  rather,  it  focuses  attention  on  finding  special 
classes  of  nonconvex  cost  functions  that  may  allow  for  specialized  convergence  results. 

3.1.1  Related  Work 

Cortes  et  al.  [26]  introduced  a  controller  for  multi-robot  coverage  that  works  by  con¬ 
tinually  driving  the  robots  toward  the  centroids  of  their  Voronoi  cells,  as  described 
in  Section  2.19.  This  inherently  geometric  strategy  has  seen  many  recent  extensions 
to  robots  with  a  limited  sensing  radius  in  [25] ,  to  heterogeneous  groups  of  robots  and 
nonconvex  environments  in  [77] ,  and  to  incorporate  learning  of  unknown  environments 
in  [97].  A  recent  text  that  presents  much  of  this  work  in  a  cohesive  fashion  is  [14]  and 
an  excellent  overview  is  given  in  [63] .  Coverage  controllers  also  have  been  successfully 
implemented  on  robotic  systems  in  [94,96].  In  this  work  we  adopt  notational  con¬ 
ventions  from  the  Voronoi  based  coverage  control  literature.  Other  common  methods 
for  coverage  control  take  a  probabilistic  perspective.  For  example  [57]  proposes  an 
algorithm  for  positioning  robots  to  maximize  the  probability  of  detecting  an  event 
that  occurs  in  the  environment.  Distributed  dynamic  vehicle  routing  scenarios  are 
considered  in  [4,76],  in  which  events  occur  according  to  a  random  process  and  are  ser¬ 
viced  by  the  robot  closest  to  them.  Another  common  coverage  control  method  is  for 
robots  to  drive  away  from  one  another  using  artificial  potential  fields  [44] .  Despite  the 
rather  different  models  and  objectives  in  these  works,  there  are  two  common  points 
which  motivate  us  to  find  a  unifying  principle:  1)  they  all  rely  upon  an  optimization, 
and  2)  they  all  use  controllers  that  solve  this  optimization  through  the  evolution  of  a 
dynamical  system. 

Some  existing  approaches  do  not  fit  under  the  framework  we  propose  in  this  chap¬ 
ter.  A  significant  body  of  work  has  looked  at  coverage  control  as  a  motion  planning 
problem.  A  survey  of  this  work  can  be  found  in  [20] ,  and  some  significant  contribu¬ 
tions  can  be  found  in,  for  example,  [16,56]  and  the  citations  therein.  Other  authors 
have  proposed  information  theoretic  algorithms  which  consider  placing  sensors  se¬ 
quentially  rather  than  driving  them  with  a  controller.  Works  such  as  [42,52]  position 
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sensor  nodes  to  maximize  information  for  the  sake  of  estimating  a  Gaussian  random 
process  in  the  environment. 

3.1.2  Contributions 

The  optimization  approach  in  this  chapter  ties  together  much  of  the  existing  literature 
on  coverage  control.  Specifically,  our  contributions  are: 

1.  We  propose  a  cost  function,  putting  particular  emphasis  on  the  role  of  a  mixing 
function,  a  previously  unrecognized  component  that  captures  critical  assump¬ 
tions  about  the  coverage  task.  We  introduce  a  family  of  mixing  functions  with 
a  free  parameter,  a,  and  show  that  different  values  of  the  parameter  correspond 
to  different  assumptions  about  the  coverage  task,  specifically  showing  that  a 
minimum  variance  solution  (i.e.  a  probabilistic  strategy)  is  obtained  with  a  pa¬ 
rameter  value  of  a  =  —  1,  Voronoi  coverage  (a  geometric  strategy)  is  recovered 
in  the  limit  a  — >  — oo,  and  a  broad  family  of  potential  field  based  herding  and 
consensus  controllers  are  recovered  for  positive  values  of  a. 

2.  We  prove  a  new  result  linking  the  convexity  of  a  cost  function  to  the  multi-agent 
phenomenon  of  consensus.  We  show  that  coverage  tasks  are  fundamentally 
different  from  consensus,  and  that  they  require  the  optimization  of  a  nonconvex 
cost  function.  This  suggests  inherent  limitations  to  gradient  descent  controller 
designs,  which  are  pervasive  in  the  coverage  control  literature. 

The  chapter  is  organized  as  follows.  In  Section  3.2  we  introduce  the  cost  func¬ 
tion,  describing  the  purpose  of  each  of  its  parts  including  the  mixing  function.  We 
then  produce  a  class  of  provably  stable  distributed  coverage  controllers  by  taking 
the  gradient  of  the  cost  function.  In  Section  3.3  we  derive  three  special  cases  of  the 
controller;  a  Voronoi  controller,  a  minimum  variance  controller,  and  a  potential  field 
controller.  Section  3.4  presents  our  results  on  the  relation  between  the  convexity  of  a 
cost  function,  and  multi-agent  consensus.  Simulation  results  are  given  in  Section  3.5 
and  a  synopsis  is  presented  in  Section  3.6. 
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3.2  Generalized  Coverage 


In  this  section  we  introduce  a  general  multi-agent  cost  function.  We  will  use  this 
cost  function  to  define  a  new  class  of  multi-agent  controllers  by  introducing  a  mixing 
function ,  which  describes  how  information  from  different  robots  should  be  combined. 
We  use  the  cost  function  to  derive  a  stable  gradient  descent  controllers  of  the  form 
(2.7). 

3.2.1  Coverage  Cost  Function 

In  keeping  with  the  notation  from  Chapter  2,  let  there  be  n  robots,  and  let  robot 
i  have  a  position  p,  €  V  C  Rdp,  where  V  is  the  state  space  of  a  robot,  and  dp  is 
the  dimension  of  the  space.  The  configuration  of  the  multi-robot  system  in  denotes 
P  =  \pi  •  •  -Pn]T  €  Vn.  We  want  our  robots  to  cover  a  bounded  region  Q  C  R^9, 
which  may  or  may  not  be  related  to  the  position  space  V  of  the  robots.  For  example, 
the  robots  may  be  constrained  to  move  in  the  space  that  they  cover,  so  V  —  Q  as 
in  [26] ,  or  the  robots  may  hover  over  a  planar  region  that  they  cover  with  cameras, 
so  V  c  R3  and  Q  C  R2,  as  in  [94]. 

For  each  robot,  a  cost  of  sensing,  or  servicing,  a  point  q  6  Q  is  given  by  a 
function  f(pi:  q ).  For  simplicity  of  analysis  we  assume  that  f(pi,q)  takes  on  only  non¬ 
negative  values,  and  that  it  is  differentiable  with  respect  to  p*  (this  can  be  generalized 
considerably  as  in  [25]).  The  sensor  measurements  of  the  n  robots  are  combined  in  a 
function  g(f(pi,  q),  ■  ■  ■ ,  f(pn ,  <?)),  which  we  will  call  the  mixing  function.  The  mixing 
function  embodies  assumptions  about  the  coverage  task;  that  is,  by  changing  the 
mixing  function  we  can  derive  Voronoi  based  coverage  control,  probabilistic  coverage 
control,  and  a  variety  of  other  kinds  of  distributed  controllers. 

Combining  these  elements,  we  propose  to  use  a  cost  function  of  the  form 

w(P)  =  [  9{f(puq),---,f{pn,q))<f>{q)dq.  (3.1) 

Jq 

where  :  Rd?  i-»  R>o  (we  use  the  notation  R>0  to  mean  the  set  of  positive  real 


52 


numbers  and  M>0  the  set  of  vectors  whose  components  are  all  positive,  and  likewise 
for  R>0  and  R>0)  is  a  weighting  of  importance  over  the  region  Q.  Intuitively,  the 
cost  of  the  group  of  robots  sensing  at  a  single  arbitrary  point  q  is  represented  by  the 
integrand  g(f{pi,  q),  ■  ■  ■ ,  f(pn ,  <?))•  Integrating  over  all  points  in  Q,  weighted  by  their 
importance  <f>(q)  gives  the  total  cost  of  a  configuration  of  the  robots.  We  want  to  find 
controllers  that  stabilize  the  robots  around  configurations  P*  that  minimize  H.  We 
will  see  in  Section  3.4  that  for  coverage,  and  many  other  multi-agent  problems,  7i  is 
necessarily  nonconvex,  therefore  gradient  based  controllers  will  yield  locally  optimal 
robot  configurations.  The  cost  function  (3.1)  will  be  shown  to  subsume  several  dif¬ 
ferent  kinds  of  existing  coverage  cost  functions.  Drawing  out  the  relations  between 
these  different  coverage  algorithms  will  suggest  new  insights  into  when  one  algorithm 
should  be  preferred  over  another. 


3.2.2  Mixing  Function 

The  mixing  function  ga  :  R>0  >—>  R  describes  how  information  from  different  robots 
should  be  combined  to  give  an  aggregate  cost  of  the  robots  sensing  at  a  point  q.  This 
is  shown  graphically  in  Figure  3-1  where  the  overlap  of  the  two  sensors  is  shown  for 
illustrative  purposes  as  the  intersection  of  two  circles.  We  propose  a  mixing  function 
of  the  form 

n 

=  (3'2) 
i=  1 

with  a  free  parameter  a.  The  arguments  /*  >  0  are  real  valued,  and  in  our  context 
they  are  given  by  evaluating  the  sensor  function  f{pi,q),  hence  the  notation  /*. 

This  mixing  function  has  several  important  properties.  Firstly,  notice  that  for 
a  >  1  it  is  the  p-norm  of  the  vector  [/i  •  •  •  /„]T.  Specifically,  it  is  convex  for  a  >  1 
and  as  a  — >  oo,  ga(-)  -»  max((-),  which  is  the  norm.  However,  in  the  regime 
where  a  <  1,  ga(-)  is  not  a  norm  because  it  violates  the  triangle  inequality.  In  this 
regime  it  is  also  nonconvex,  the  significance  of  this  will  be  explored  more  in  Section 
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Robot  position 

Pi 


Sensor  cost  Mixing  function 


Figure  3-1:  The  mixing  function  is  illustrated  in  this  figure.  The  mixing  function 
determines  how  information  from  the  sensors  of  multiple  robots  is  to  be  combined, 
shown  graphically  as  the  intersection  of  the  two  circles  in  the  figure. 

3.4.  One  can  readily  verify1  that  as  a  — ♦  —  oo,  g(-)  — >  rninfy).  From  an  intuitive 
point  of  view,  with  a  <  1,  ga(-)  is  smaller  than  any  of  its  arguments  alone.  That  is, 
the  cost  of  sensing  at  a  point  q  with  robots  at  pl  and  Pj  is  smaller  than  the  cost  of 
sensing  with  either  one  of  the  robots  individually.  Furthermore,  the  decrease  in  ga 
from  the  addition  of  a  second  robot  is  greater  than  that  from  the  addition  of  a  third 
robot,  and  so  on.  There  is  a  successively  smaller  benefit  to  adding  more  robots.  This 
property  is  often  called  supermodularity,  and  has  been  exploited  in  a  rather  different 
way  in  [52].  Surface  plots  of  ga(fi,  fi)  for  a  =  —1,  1,  and  2  are  shown  in  Figures 
3-2 (a),  3-2 (b),  and  3-2 (c),  respectively,  and  the  decrease  in  ga(-)  as  the  number  of 
arguments  grows  is  shown  in  Figure  3-2(d).  In  this  work  we  consider  the  number  of 
robots  to  be  fixed,  but  it  is  useful  to  illustrate  the  supermodularity  property  of  the 
mixing  function  by  considering  the  successive  addition  of  new  robots. 
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(a)  Mixing  Function  Surface  (a  =  -1)  (b)  Mixing  Function  Surface  (a  =  1) 


123456789  10 

Number  of  Robots 

(c)  Mixing  Function  Surface  (a  =  2)  (d)  Mixing  Function  Supermodularity 

(a  =  -1) 


Figure  3-2:  The  proposed  mixing  function  with  a  =  —1,  1,  and  2  is  shown  in  3-2(a), 
3-2(b),  and  3-2(c),  respectively.  The  function  is  convex  for  a  >  1  and  nonconvex 
otherwise.  The  nonlinear  decrease  in  the  function  as  more  sensors  are  added,  a 
property  known  as  supermodularity,  is  shown  in  Figure  3-2(d). 


Including  this  mixing  function  in  the  cost  function  from  (3.1)  gives 


n 


a 


iY^f(Pi,Q)a)  dq. 

?:= l 


(3.3) 


3.2.3  Gradient  Control 


We  use  the  multi-robot  system  equation  (2.7)  to  derive  the  gradient  controller 


Pi 


df{Pi,q ) 
dpi 


4>{q)  dq. 


(3.4) 
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To  provide  some  intuition  about  the  meaning  of  this  function,  notice  that  in  the  case 
that  f(j>i,q)  is  strictly  increasing,  the  function  inside  the  integral  (f(pi,  q)  / 3a)Q_1 
gives  an  approximation  to  the  indicator  function2  of  the  Voronoi  cell  of  agent  i,  the 
approximation  improving  asa->  —  oo.  This  is  shown  graphically  in  Figure  3-3.  For 


(b)  a  =  — 1 


(c)  ot  —  —5 


(d)  a  =  -10 


Figure  3-3:  Contour  plots  of  (f{pi,  q) / ga)a~l  are  shown  for  a  configuration  of  ten  agent 
positions.  The  Voronoi  tessellation  is  shown  as  well  for  comparison.  As  the  parameter 
a  approaches  — oo,  ( f(Pi,q)/ga)a~ 1  becomes  closer  to  the  indicator  function  of  the 
Voronoi  cell  Vi. 


simplicity,  we  choose  the  function  f(j>i,q )  to  be 


f(Pi,q)  -  Ih-piW2,  so  that  =  -{q-Pi). 

2  dpi 


(3.5) 


Other  choices  of  f{pi,q )  were  investigated  in  [25]  and  could  be  used  here  as  well. 
This  function  represents  the  cost  of  a  single  robot  i  sensing  at  the  position  q.  The 
quadratic  form  is  appropriate  for  a  variety  of  sensors  including  light  based  sensors, 


2The  indicator  function  for  a  set  S  C  Q  returns  1  for  q  €  S,  and  0  otherwise. 
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such  as  cameras  or  laser  scanners,  chemical  sensors,  and  others.  For  tasks  in  which 
robots  have  to  drive  to  a  point  q  for  servicing,  and  we  want  the  cost  to  be  proportional 
to  the  distance  travelled,  it  would  be  more  appropriate  to  use  f{pi,  q)  =  ||<?  —  Pill  5  for 
example.  Convergence  to  critical  points  of  Tta  and  stability  of  isolated  local  minima 
can  be  proved  by  a  direct  application  Theorem  2.3  and  Theorem  2.4,  respectively, 
from  Chapter  2. 

Theorem  3.1  (Convergence  of  Coverage  Control)  For  a  group  of  robots  with 
closed-loop  dynamics  (3.4)  the  robots  converges  to  the  set  of  critical  points  ofHa. 

Proof  3.1  The  theorem  is  immediate  from  Theorem  2.3. 

Theorem  3.2  (Stability  of  Local  Minima)  For  a  group  of  robots  with  closed-loop 
dynamics  (3.4)  only  configurations  P*  for  which  7ia(P*)  is  an  isolated  local  minimum 
a, re  locally  asymptotically  stable. 

Proof  3.2  The  theorem  follows  directly  from  Theorem  2.4. 

Remark  3.1  (Stability  Vs.  Convergence)  Although  we  already  brought  attention 
to  this  distinction  in  Chapter  2,  it  is  useful  to  reiterate  the  point.  Theorem  3.1  says 
that  the  system  will  converge  to  the  set  of  critical  points  of  Ha,  while  Theorem  3.2 
says  that  only  local  minima  of  7ia  are  stable.  It  is  a  basic  fact  of  dynamical  systems 
that,  given  a  special  set  of  initial  conditions,  trajectories  may  converge  to  unstable 
equilibria.  However  small  perturbations  will  cause  them  to  leave  an  unstable  equilib¬ 
rium,,  while  stable  equilibria  are  robust  to  small  perturbations.  In  our  experience  cost 
functions  such  as  Ha  have  many  saddle  points  and  local  maxima ,  so  it  is  meaningful 
to  specify  that  the  system  prefers  local  minima  of  Ha . 

Remark  3.2  (Network  Requirements)  The  computation  of  the  controller  requires 
that  robot  i  knows  the  states  of  all  the  robots  in  the  network.  For  this  to  be  feasible 
there  must  either  be  a  global  supervisor  or  a  fully  connected  network  communication 
topology.  It  would  be  more  useful  if  the  controller  depended  only  upon  the  states  of 
robots  with  which  it  communicates.  We  suggest  two  methods  to  accomplish  this,  but 
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we  do  not  analyze  them  in  detail  in  this  thesis.  First,  robot  i  can  approximate  its 
control  law  simply  by  computing  (34)  using  only  the  states  of  the  robots  with  which  it 
is  in  communication.  We  expect  this  to  give  a  good  approximation  because  the  func¬ 
tion  ( f  (_Pj ,  q) / ga)a~ 1  depends  weakly  upon  the  states  of  agents  that  are  not  Voronoi 
neighbors,  especially  for  small  values  of  a,  as  evident  from  Figure  3-3  .  A  rigorous 
stability  analysis  of  this  approximation  scheme  is  difficult,  however.  A  second  option 
is  for  a  robot  i  to  use  an  estimated  configuration  vector,  P,  in  its  calculation  of  the 
control  law.  The  estimated  configuration  can  be  updated  online  using  a  standard  dis¬ 
tributed  consensus  algorithm  (a  so  called  “consensus  estimator”).  We  expect  that  such 
a  scheme  may  be  amenable  to  a  rigorous  stability  proof  as  its  architecture  is  similar 
to  adaptive  control  architectures.  The  investigation  of  these  matters  is  left  for  future 
work. 


3.3  Deriving  Special  Cases 

In  this  section  we  show  how  the  cost  function  3.1  can  be  specialized  to  give  three 
common  kinds  of  coverage  controllers,  a  Voronoi  controller,  which  is  geometric  in 
nature,  a  minimum  variance  controller,  which  has  a  probabilistic  interpretation,  and 
a  potential  field  controller.  We  conjecture  that  other  coverage  objectives  beyond  these 
three  can  be  achieved  with  different  choices  of  the  mixing  function  parameter  a. 

3.3.1  Voronoi  Coverage 

The  Voronoi-based  coverage  controller  described  in  Section  2.19  is  based  on  a  gradient 
descent  of  the  cost  function 

n  r  i 

=  YljVi  2 II g  ~  P*ll  V(?)  d<h  (3-6) 

where  V*  =  {q  €  Q  \  \\q  -  Pi\\  <  || q  -  pj\\,  Vj  4  0  is  the  Voronoi  cell  of  robot  i  and 
the  use  of  the  subscript  V  is  to  distinguish  it  from  H  and  Ha.  The  Voronoi  partition 
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can  equivalently  be  written  using  the  min  function  as 


Wv  =  f  m.in(~ ||<7  —  Pil|2)<K?)  dq,  (3.7) 

because  a  point  q  is  in  the  Voronoi  cell  V)  if  and  only  if  \\q  -  Pj\\  is  minimized  for 
j  =  i.  As  noted  in  Section  3.2.2,  lima^_oo  ga{fi,  ...,/„)  =  min,;  ft.  Therefore  Hv  is  a 
special  instance  of  (3.3)  with  the  mixing  function  g _oo  =  lima^-xgQ  and  f(pi,q )  = 
1/211?  —  Pill2- 

The  choice  of  the  min  function  for  a  mixing  function  now  warrants  some  reflection. 
Consider  a  distributed  actuation  scenario  in  which  we  want  to  position  robots  so  as  to 
service  an  event  that  occurs  randomly  at  some  point  in  the  environment  q.  Suppose 
any  robot  is  equally  capable  of  rendering  the  service,  robots  have  to  physically  travel 
to  the  event  to  render  the  service,  and  our  objective  is  to  service  an  event  as  quickly 
as  possible.  Naturally,  an  event  should  be  serviced  by  the  robot  that  is  closest  to 
it,  as  it  will  reach  the  event  the  most  quickly.  In  this  case,  the  min  function  is  the 
appropriate  choice  for  a  mixing  function.  By  using  the  min  function  we  are  saying 
that  the  cost  incurred  by  all  the  robots  due  to  the  event  at  q  is  the  same  as  that 
incurred  by  the  robot  that  is  closest  to  q. 

On  the  other  hand,  consider  a  sensing  task  in  which  an  event  of  interest  occurs 
randomly  at  a  point  q  and  is  sensed  at  a  distance  by  sensors  located  on  the  robots. 
In  this  case  the  use  of  the  min  function  is  more  difficult  to  justify.  Using  the  min 
function  in  this  instance  would  imply  that  even  though  both  pi  and  p3  have  some 
sensory  information  about  the  event,  the  cost  function  only  counts  the  information 
from  the  one  that  is  closest  to  q.  This  seems  to  be  a  poor  choice  of  cost  function  for 
sensing,  since  in  such  cases  we  would  want  to  capture  the  intuition  that  two  sensors 
are  better  than  one.  The  mixing  function  (3.2)  captures  this  intuition.  Furthermore, 
even  in  distributed  actuation  tasks,  using  a  continuous  approximation  to  the  Voronoi 
cell  improves  the  robustness  of  the  controller.  The  discrete,  geometric  nature  of  the 
Voronoi  computation  combined  with  the  continuous  controller  can  lead  to  chattering, 
and  small  sensing  errors  can  result  in  large  changes  in  the  control  input.  Fortunately, 
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the  Voronoi  tessellation  can  be  approximated  arbitrarily  well  by  choosing  a  small  value 
of  a,  thereby  preserving  the  Voronoi  controller  behavior  while  improving  robustness. 


3.3.2  Minimum  Variance  Coverage 


We  show  in  this  section  that  setting  the  mixing  function  parameter  to  a  =  —  1  causes 
the  robots  to  minimize  the  expected  variance  of  their  measurement  of  the  location  of 
a  target  of  interest.  As  a  side  effect,  we  will  formulate  an  optimal  Bayesian  estimator 
for  the  location  of  the  target  given  the  measurements  of  the  agents. 

Suppose  our  agents  are  equipped  with  sensors  that  give  a  noisy  measurement  of 
the  position  of  a  target  in  the  environment.  Let  the  target  position  be  given  by 
a  random  variable  q  that  takes  on  values  in  Q,  and  agent  i  gives  a  measurement 
Hi  =  q  +  w,  where  w  ~  N( 0, I2\/ f(pi,  q))  is  a  bi- variate  normally  distributed  random 
variable,  and  where  I2  is  the  2x2  identity  matrix.  The  variance  of  the  measurement, 
f  (j?i >  q) j  is  a  function  of  the  position  of  the  sensor  and  the  target.  Intuitively  one 
would  expect  a  sensor  to  localize  a  target  with  more  precision  the  closer  the  target 
is  to  the  sensor.  Then  the  measurement  likelihood  of  agent  i  is  P(yj  |  q  :  pi)  = 
1/ (2irf(ph  q))  exp{— ||yj  —  q\\2  / (2/(pj,  g))},  and  the  notation  P(-  :  p^)  is  to  emphasize 
that  the  distribution  is  a  function  of  the  agent  position.  Assume  the  measurements 
of  different  agents  conditioned  on  the  target  position  are  independent.  Also,  let  4>{q) 
be  the  prior  distribution  of  the  target’s  position.  Then  Bayes  rule  gives  the  posterior 
distribution, 


%  \Vu---,Vn)  = 


nti  i  g :  Pi)^(g) 

/QntiP(w  I  <t  •  Pi)<t>(<l)  dq 


(3.8) 


One  can  use  the  posterior  to  obtain  a  Bayesian  estimate  of  the  position  of  the  event  q 
given  the  measurements.  For  example,  one  may  choose  to  estimate  q  using  the  mean, 
the  median,  or  the  maximum  of  the  posterior  in  (3.8). 

Our  interest  here,  however,  is  not  in  estimating  q.  Instead  we  are  interested  in 
positioning  the  robots  so  that  whatever  estimate  of  q  is  obtained  is  the  best  possible 
one.  To  this  end,  we  seek  to  position  the  robots  to  minimize  the  variance  of  their 
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combined  sensor  measurements.  The  product  of  measurement  likelihoods  in  the  nu¬ 
merator  of  (3.8)  can  be  simplified  to  a  single  likelihood  function,  which  takes  the  form 
of  an  un-normalized  Gaussian 


\  q  ■  Pi)  =  Aexp  S^-  ^  Jy  |  ,  (3.9) 

whose  variance  is  equivalent  to  our  mixing  function  <?_ i(-)  =  CC”=i  f{Puq)~l) 
The  values  of  A  and  y  are  not  important  in  this  context,  though  we  state  them  for 
completeness: 


y  =  g-i(-)^2f(Pi,q)  1Vi ,  and 

4=1 


A  = 


1 


(27r)nn?=i  fiPuq) 


exp 


If  we  want  to  position  the  robots  so  as  to  obtain  the  most  decisive  information  from 
their  sensors,  we  should  move  them  to  minimize  this  variance.  Notice,  however,  that 
9-i(f(pi,q),  •  •  • ,  f{pn,  <?))  is  a  random  variable  since  it  is  a  function  of  q.  Taking  the 
expectation  over  q  of  the  likelihood  variance  gives  our  original  cost  function, 


rt-i=Eq[g_1{f(pl,q),...,f(pn,q))}=  [  g~i(f(Pi,  ?),  •  •  • ,  f(Pn,  q))<Kq)  dq.  (3.10) 

JQ 

Thus  we  can  interpret  the  coverage  control  optimization  as  finding  the  agent  positions 
that  minimize  the  expected  variance  of  the  likelihood  function  for  an  optimal  Bayes 
estimator  of  the  position  of  the  target. 


A  more  theoretically  appealing  criterion  would  be  to  position  the  agents  to  min¬ 
imize  the  variance  of  the  posterior  distribution  in  (3.8).  This  gives  a  considerably 
more  complicated  cost  function. 


Var [q  j  yl,...,yn]  = 


Iq  ECU  F(vi  I  g :  Pi)<P(q)qqT  d(i 

JQ  niLi  p(y>:  I  ? :  Pi)<t>(q)  dq 


(3.11) 
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where 


<?  =  E[ff  I  Vh--',yn]  = 


/qIE-iP(tt  1  Q-Pi)<l>(q)q<k 
jQ  nr=i  1 1  ■  Pi)<t>(<i)  d<i ' 


(3.12) 


The  complication  of  this  cost  function  and  the  fact  that  gradients  can  not  be  easily 
computed  makes  it  a  less  practical  option. 


3.3.3  Potential  Field  Coverage 

The  third  type  of  coverage  controller  we  consider  is  significantly  different  from  the 
previous  two  in  that  it  does  not  involve  an  integral  over  the  environment.  Instead 
it  relies  on  the  idea  that  robots  should  push  away  from  one  another  to  spread  out 
over  an  environment,  but  should  not  move  too  far  from  one  another  or  else  they  will 
become  disconnected.  Surprisingly,  however,  we  will  show  that  this  rather  different 
coverage  philosophy  can  be  reconciled  with  our  generalized  coverage  cost  function  TCa 
in  (3.3). 

Let  the  importance  function,  </>(q),  be  given  as  a  sum  of  delta-Dirac  functions 
centered  at  each  of  the  robot  positions 

n 

<P{q)  =  '^2Kh-Pi\\)-  (3.13) 

t=i 

Substituting  this  for  <p(q)  in  (3.3),  the  integral  in  Ha  can  then  be  evaluated  analyti¬ 
cally  to  give  Hpot  =  Ya= 1 9a{f(Pi,Pi),  •  •  • ,  f(pn,Pi)),  and  with  a  =  1  we  get 

n  n 

^pot  =  X]  /L/  / ip j  i  Pi)  >  (3-14) 

i=l  j=l,j& 

which  is  a  common  cost  function  for  potential  field  based  models  for  herding  and 
consensus,  where  f(pj,Pi )  can  be  interpreted  as  an  inter-agent  potential  function. 
One  choice  for  f(jpj,Pi)  is 

f{Pj,Pi)  =  \\\Pi~PiW~2  -  \\Pj  -Pit1  (3.15) 
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which,  taking  the  gradient  of  (3.14),  yields  the  controller 


A  -  *  t 


\\\Pj~Pi 


Pj  -  Pi 
Wpj-PiW 


(3.16) 


Controllers  similar  to  this  one  have  been  studied  in  a  number  of  works,  for  example 
[29,35,44, 104].  There  are  numerous  variations  on  this  simple  theme  in  the  literature. 


3.3.4  Computational  Complexity 

The  gradient  controllers  described  in  this  work  must  inevitably  be  discretized  and 
implemented  in  a  discrete  time  control  loop.  A  common  criticism  of  the  Voronoi 
based  coverage  controller  is  that  it  is  computationally  intensive.  At  each  iteration 
of  the  loop,  a  robot  must  re-compute  its  Voronoi  cell.  Additionally,  the  controller 
must  compute  spatial  integrals  over  a  region.  In  general,  a  discretized  approximation 
must  be  used  to  compute  the  integral  of  <j>(q)  over  the  Voronoi  cell,  which  is  again 
computationally  intensive.  The  two  parameters  that  are  important  for  computation 
time  are  the  number  of  robots  n  and  the  number  of  grid  squares  in  the  integral 
computation,  which  we  will  call  to.  The  typical  decentralized  algorithm  for  a  single 
robot  to  compute  its  Voronoi  cell  (from  [26])  runs  in  0(ri)  time.  The  time  complexity 
for  computing  a  discretized  integral  is  linear  in  the  number  of  grid  squares,  and  at 
each  grid  square  requires  a  check  if  the  center  point  is  in  the  Voronoi  cell,  which  is 
an  0{n)  operation.  Therefore  the  time  complexity  of  the  integral  is  in  0(nm).  The 
Voronoi  cell  must  be  computed  first,  followed  by  the  discretized  integral,  therefore 
the  standard  Voronoi  controller  has  time  complexity  0{n{m  +  1))  at  each  step  of  the 
control  loop. 

Our  controller  in  (3.4)  does  not  require  the  computation  of  a  Voronoi  cell,  but  it 
does  require  the  discretized  spatial  integral  over  the  environment.  We  do  not  have  to 
check  if  a  point  is  in  a  polygon,  but  the  integrand  we  evaluate,  namely  ga  is  linear  in  n. 
Therefore  the  integral  computation  still  has  time  complexity  0(nm),  which  is  the  time 
complexity  of  the  controller  at  each  step  of  the  control  loop.  Yet  as  a  decreases,  the 
behavior  of  the  controller  approaches  that  of  the  Voronoi  controller.  The  controller 
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we  propose  in  this  chapter  is  therefore  significantly  simper  in  implementation  (since 
it  does  not  require  the  Voronoi  computation),  and  it  is  faster  computationally. 


3.4  Convexity  and  Consensus 

Since  we  treat  the  multi-agent  coverage  problem  as  an  optimization,  it  is  natural  to 
ask  what  sort  of  optimization  we  are  dealing  with,  and  what  optimization  tools  can 
be  brought  to  bear  to  solve  it.  We  show  in  this  section  that  the  cost  function  in  (3.3) 
is  nonconvex,  and  that  nonconvexity  is  a  required  feature  of  a  large  class  of  multi¬ 
agent  problems,  however  undesirable  this  may  be  from  an  optimization  perspective. 
Specifically,  we  demonstrate  a  link  between  the  convexity  of  a  cost  function  and  the 
multi-agent  phenomena  known  as  consensus.  For  our  purposes,  consensus  describes  a 
multi-agent  configuration  in  which  all  agents  take  on  the  same  state,  p1  =  p2  —  . . .  = 
pn.  Consensus  is  geometrically  represented  in  the  state  space  Vn  as  a  dp-dimensional 
hyperplane  that  passes  through  the  origin  (from  the  dv(n  -  1)  independent  equality 
constraints) .  This  is  illustrated  by  the  diagonal  line  in  Figure  3-5  in  a  simplified  2D 
setting.  We  will  prove,  with  some  technical  assumptions,  that  a  multi-agent  problem 
with  a  convex  cost  function  admits  at  least  one  globally  optimal  consensus  solution. 
Figure  3-4  shows  a  graphical  schematic  illustrating  the  meaning  of  the  theorem. 

Consider  a  general  multi-agent  cost  function  H  :  Vn  (->■  R.  As  before,  an  agent 
i  has  a  state  pi  G  V  C  Md.  It  will  be  more  convenient  in  this  section  to  refer  to  a 
configuration  of  agents  as  a  tuple  (pi, . . .  ,pn)  €  Vn,  rather  than  the  column  vector 
notation  used  previously.  Let  us  assume  that  agents  are  anonymous  with  respect  to 
the  cost  function,  by  which  we  mean  that  the  positions  of  any  two  agents  can  be 
interchanged  without  affecting  the  value  of  the  cost  function.  This  is  formalized  by 
the  following  assumption. 

Assumption  3.1  (Anonymity  of  Agents)  The  cost  function  Tt  is  such  that 

%{■  ■  ■  iPii  ■  ■  ■  iPji  ■  ■  ■)  7~t(-  ■ . , Pj , . . . , Pi, . . .)  Vi,  j  G  {1;  •  •  • ;  tt}.  (3.17) 


64 


Figure  3-4:  This  schematic  illustrates  the  meaning  of  Theorem  3.3  and  Corollary  3.1. 

If  the  cost  function  Ti  is  convex,  the  robots  will  all  move  to  the  same  position,  a 
behavior  called  consensus. 

Assumption  3.1  is  in  keeping  with  the  ethos  of  multi-robot  systems,  where  the  em¬ 
phasis  is  on  the  global  patterns  that  result  from  the  interactions  of  many  identical 
robots.  Furthermore,  let  us  assume  that  Ti  and  Vn  satisfy  at  least  one  of  the  three 
properties  in  Theorem  2.1,  which  is  simply  to  say  that  the  set  of  minima  of  Ti  over 
Vn  is  non-empty.  Now  we  give  the  main  result  of  this  section. 

Theorem  3.3  (Convexity  and  Consensus)  Under  Assumption  3.1,  if  the  cost  func¬ 
tion  Ti(pi, . . .  ,pn)  is  convex,  Vn  is  convex,  and  one  of  the  conditions  in  Theorem  2.1  is 
satisfied,  thenH{pi, . .  .,pn)  has  a  global  minimum  such  thatpi  =  Pj  Vi,j  €  {1, . . .  ,n}. 

Proof  3.3  Our  argument  rests  upon  Assumption  3.1  and  the  fact  from  Theorem  2.1 
that  the  set  of  minima  of  the  convex  function  Ti  is  a  convex  set.  Let  O  be  the  set  of 
minima,  and  let  (. . .  ,p* , . . .  ,p*, . . .)  be  an  optimal  solution  in  that  set.  By  Assumption 
3.1,  (. . .  ,p*, . . .  ,p*, . . .)  is  also  an  optimal  solution  for  any  i  and  j.  Therefore  all 
permutations  of  components  in  (pj, . . .  ,p*)  are  optima.  Then  by  convexity  of  h* ,  all 
convex  combinations  of  points  in  h*  are  in  h* .  In  particular,  the  point  (p,  ...,p), 
where  p  =  1/n  ]C”=1  p,  is  an  optimal  solution  (since  it  is  a  convex  combination  of 
permutations  of  (pi, . . . , pn) ). 
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Figure  3-5:  This  schematic  shows  the  geometrical  intuition  behind  the  proof  of  The¬ 
orem  3.3  in  a  simplified  2D  setting.  Corollary  3.1  is  proved  by  noticing  that  the  set 
of  minima  is  a  single  point  (the  consensus  solution)  if  Tt  is  strictly  convex. 

We  show  a  geometric  schematic  of  the  proof  argument  in  Figure  3-5.  The  proof 
uses  the  fact  that  the  convex  set  of  minima  must  intersect  the  consensus  hyperplane 
(the  hyperplane  where  pi  =  pj  Vi,  j)  at  at  least  one  point.  A  simple  corollary  follows. 

Corollary  3.1  (Strict  Convexity)  If  the  conditions  of  Theorem  3.3  are  met  and 
the  cost  function  Ti(pi, ■ . .  ,pn)  is  strictly  convex ,  then  the  minimum  is  unique  and  is 
such  that  pi  =  pj  Vi,j  €  {1, . . . ,  n}. 


Proof  3.4  A  strictly  convex  function  has  at  most  one  minimum  over  a  convex  do¬ 
main. 

Remark  3.3  (Consensus  vs.  Non-consensus)  Theorem  3.3  suggests  that  it  is  fu¬ 
tile  to  search  for  convex  cost  functions  for  multi-robot  problems  other  than  consensus. 
It  delineates  two  classes  of  multi-agent  behaviors  reminiscent  of  complexity  classes  in 
the  theory  of  computation.  One  class,  which  we  will  call  consensus  behaviors,  can  be 
described  as  optimizing  a  convex  cost  function.  The  other  class,  which  we  will  call 
non-consensus  behaviors,  is  fundamentally  different  in  that  it  can  only  be  described 
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with  nonconvex  cost  functions.  This  is  important  because  if  we  wish  to  design  an 
optimization  to  solve  a  multi-agent  problem,  and  we  know  that  the  problem  cannot 
be  solved  satisfactorily  by  all  the  agents  taking  the  same  state,  then  we  must  use  a 
nonconvex  cost  function.  Likewise  if  we  observe  a  multi-agent  behavior  in  nature 
which  cannot  be  described  by  all  agents  reaching  the  same  state  (the  construction  of 
a  termite  nest,  for  example),  then  an  optimization-based  explanation  of  this  behavior 
must  be  nonconvex. 


Remark  3.4  (Coverage  is  Nonconvex)  This  is  directly  applicable  to  coverage  prob¬ 
lems.  Indeed,  coverage  cannot  be  achieved  with  all  agents  moving  to  the  same  place, 
therefore  coverage  problems  must  involve  the  optimization  of  a  nonconvex  cost  func¬ 
tion.  Our  parameterized  cost  function  Tta  from  (3.3)  is  nonconvex  for  a  <  1,  in  which 
regime  it  corresponds  to  a  coverage  task  (e.g.  a  — *•  —  oo  for  Voronoi  and  a  —  —  1  for 
minimum  variance).  It  becomes  convex  (assuming  f  is  convex)  for  a  >  1  in  which 
regime  it  results  in  consensus.  Theorem  3.3  explains  why  this  is  the  case. 


Remark  3.5  (Future  Directions)  From  an  algorithmic  point  of  view,  this  is  un¬ 
fortunate.  Convex  optimization  has  a  powerful  and  well  characterized  tool  set  guaran¬ 
teed  to  reach  global  minima,  but  nonconvex  optimization  requires  searching  out  special 
cases  and  special  cost  function  properties.  Often  one  must  be  satisfied  with  local  min¬ 
ima.  Distributed  coverage  controllers  that  use  gradient  methods  (such  as  those  in 
this  chapter)  guarantee  convergence  to  local  minima,  which  is  all  one  can  expect  in  a 
general  nonconvex  setting.  This  points  towards  at  least  two  open  questions  for  future 
work:  can  we  find  nonconvex  multi-robot,  cost  functions  that  have  special  properties 
that  guarantee  global  results?  For  example  can  we  find  multi-robot  cost  functions  for 
which  all  minima  are  global  (all  minima  have  the  same  cost)  or  for  'which  there  is  only 
one  minimum  even  though  the  function  is  nonconvex?  Alternately,  are  there  noncon¬ 
vex  optimization  methods  not  based  on  gradient  descent  that  can  be  implemented  in  a 
multi-agent  setting? 


67 


3.5  Simulation  Results 


The  controller  for  the  three  scenarios  described  in  Section  3.3  were  simulated  in  a 
Matlab  environment.  The  environment  Q  was  taken  to  be  a  unit  square,  and  the 
function  <p(q)  was  set  to  be  the  sum  of  two  Gaussian  functions,  one  centered  at  (.2,  .2) 
and  the  other  at  (.8,  .8),  both  with  variance  .2.  We  expect  to  see  a  higher  density  of 
robots  around  areas  of  large  In  our  case,  the  robots  group  around  the  Gaussian 
centers. 

The  results  of  a  simulation  with  ten  robots  using  the  Voronoi  based  controller, 
which  corresponds  to  a  — >  — oo,  is  shown  in  Figs.  3-6(a)  and  3-6 (b).  Similar  plots 
are  shown  for  the  minimum  variance  controller,  with  a  =  —1,  in  Figs.  3-6(c)  and 
3-6  (d).  Comparison  of  the  two  controllers  shows  that  the  Voronoi  based  controller 
causes  the  robots  to  spread  out  more,  while  as  a  increases,  the  robots  group  more 
closely  together.  When  a  >  1,  the  cost  function  becomes  convex,  and  the  robots 
all  move  to  the  same  position,  which  corroborates  our  results  relating  convexity  to 
consensus  (this  is  not  shown  in  the  plots). 

The  third  scenario  shown  in  Figs.  3-6 (e)  and  3-6(f)  uses  the  potential  field  con¬ 
troller  from  (3.16).  This  controller  uses  a  sum  of  delta-Dirac  functions  for  rather 
than  a  sum  of  Gaussians,  which  causes  the  robots  to  arrange  themselves  in  the  close- 
packed  lattice  pattern. 


3.6  Synopsis 

In  this  chapter  we  introduced  a  unifying  optimization  framework  for  multi-robot 
control  that  brings  together  several  different  existing  algorithms.  We  point  out  that 
important  properties  of  the  underlying  objective  are  embodied  in  the  way  sensor 
information  or  actuator  capabilities  are  combined  from  different  robots.  We  propose 
a  parameterized  function  to  accomplish  this  combination,  where  different  parameter 
values  are  shown  to  lead  to  different  kinds  multi-robot  algorithms.  Finally,  we  prove 
that  for  all  multi-robot  problems  other  than  consensus,  the  underlying  optimization  is 
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necessarily  nonconvex,  making  global  optimization  an  unrealistic  objective  in  general, 
especially  for  gradient  descent  controllers.  Looking  towards  future  research,  this 
motivates  a  search  for  special  classes  of  nonconvex  functions  that  admit  stronger 
convergence  guarantees,  and  a  search  for  distributed  controllers  other  than  those 
based  on  gradient  methods  that  may  lead  to  stronger  convergence  guarantees. 
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(e)  Trajectory  Pot.  Field 


(f)  Final  Config.  Pot.  Field 


Figure  3-6:  Trajectories  and  final  configurations  are  shown  for  ten  robots  using  the 
gradient  control  law  with  the  Voronoi  controller  (3-6(a),  3-6(b)),  the  minimum  vari¬ 
ance  controller  (3-6(c),  3-6(d)),  and  a  potential  field  controller  (3-6(e),  3-6(f)).  The 
Voronoi  tessellation  is  shown  for  all  scenarios  for  comparison,  even  though  the  right 
two  controllers  do  not  use  the  Voronoi  cells  for  control. 


70 


Chapter  4 


Incorporating  Learning 

4. 1  Introduction 

In  this  chapter  we  address  the  question  of  how  to  control  the  multi-robot  system 
when  the  sensory  function,  <f>(q),  which  represents  the  weighting  of  importance  over 
the  environment,  is  not  known  before  hand.  We  build  upon  the  gradient  controller 
developed  in  Chapter  3,  incorporating  a  parameter  tuning  mechanism  to  learn  the 
sensory  function  in  a  provably  stable  way.  The  control  strategy  can  be  thought  of 
as  proceeding  simultaneously  in  two  spaces.  In  the  space  of  robot  positions,  the 
robots  move  to  minimize  the  cost  function  representing  the  collective  sensing  cost  of 
the  network.  At  the  same  time,  in  a  high-dimensional  parameter  space,  each  robot 
adapts  a  parameter  vector  to  learn1  the  distribution  of  sensory  information  in  the 
environment.  We  prove  that  the  robots  eventually  reach  a  near-optimal  configuration, 
and  if  their  paths  are  sufficiently  rich,  they  reach  an  optimal  configuration.  An 
overview  of  the  control  strategy  is  shown  in  Figures  4-1. 

We  first  describe  a  learning  law  in  which  each  robot  uses  only  its  own  sensor 
measurements.  We  then  include  a  consensus  term  in  the  learning  law  to  couple  the 
learning  among  neighboring  robots.  The  main  effect  of  this  coupling  is  that  sensor 
measurements  from  any  one  robot  propagate  around  the  network  to  be  used  by  all 

HVe  will  use  the  words  learning  and  adaptation  interchangeably.  Learning  and  adaptation  are 
specifically  meant  in  the  sense  of  parameter  tuning,  as  in  adaptive  control,  rather  than  the  broader 
meaning  often  used  in  Biology  and  Bio-inspired  applications. 
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Figure  4-1:  An  overview  of  the  decentralized  control  scheme  is  shown.  The  robots,  at 
positions  Pi,  Pj,  and  Pk,  spread  out  over  the  area,  Q,  to  reach  optimal  final  positions. 
Simultaneously,  each  robot  adapts  a  parameter  vector  (a*,  &j,  and  dfc)  to  build  an 
approximation  of  the  sensory  environment.  The  parameter  vectors  for  neighboring 
robots  are  coupled  in  such  a  way  that  their  final  value,  a,  is  the  same  for  all  robots 
in  the  network. 

robots.  All  robots  eventually  learn  the  same  function  incorporating  all  the  sensor 
measurements  collected  by  all  the  robots. 


4.1.1  Related  Work 

We  use  the  notion  of  an  optimal  sensing  configuration  developed  in  [26]  and  build  upon 
it  a  parameter  adaptation  mechanism  similar  to  what  is  used  in  adaptive  control  [67, 
89,103].  Our  emphasis  in  this  chapter  is  on  incorporating  learning  to  enable  optimal 
coverage  of  an  unfamiliar  environment.  This  is  in  contrast  to  [26]  and  other  papers 
that  use  the  same  optimization  framework  (e.g.  [25,77,87])  in  which  the  distribution 
of  sensory  information  in  the  environment  is  required  to  be  known  a  priori  by  all 
robots.  This  a  priori  requirement  was  first  relaxed  in  [95]  by  introducing  a  controller 
with  a  simple  memoryless  approximation  from  sensor  measurements.  The  controller 
was  demonstrated  in  hardware  experiments,  though  a  stability  proof  was  not  found. 
In  the  present  work  we  remove  this  a  priori  requirement  by  introducing  an  adaptive 
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controller  inspired  by  the  architecture  in  [88].  The  results  in  this  chapter  elaborate 
and  improve  upon  our  previous  works  [99, 100]. 

It  is  found  that  when  each  robot  uses  only  its  own  sensor  measurements  to  learn 
the  distribution  of  sensory  information,  learning  performance  can  be  sluggish.  We 
address  this  problem  by  including  a  consensus  term2  in  the  parameter  adaptation 
law.  Consensus  phenomena  have  been  studied  in  many  fields,  and  appear  ubiqui¬ 
tously  in  biological  systems  of  all  scales.  However,  they  have  only  recently  yielded 
to  rigorous  mathematical  treatment;  first  in  the  distributed  and  parallel  computing 
community  [9, 10, 107, 108]  in  discrete  time,  and  more  recently  in  the  controls  com¬ 
munity  in  continuous  time  [13,27,46,71,111,112].  In  the  present  work,  consensus 
is  used  to  learn  the  distribution  of  sensory  information  in  the  environment  in  a  de¬ 
centralized  way  by  propagating  sensor  measurements  gathered  by  each  robot  around 
the  network.  This  is  similar  to  distributed  filtering  techniques  that  have  recently 
been  introduced,  for  example  in  [62,117],  though  in  contrast  to  those  works,  we  are 
concerned  with  maintaining  provable  stability  of  the  combined  learning  and  control 
system.  Consensus  improves  the  quality  and  speed  of  learning,  which  in  turn  causes 
the  robots  to  converge  more  quickly  to  their  optimal  positions. 

4.1.2  Contributions 

In  short,  the  main  contribution  of  this  chapter  is: 

1.  To  provide  a  controller  that  uses  parameter  adaptation  to  accomplish  coverage 
without  a  priori  knowledge  of  the  sensory  environment.  A  consensus  term  is 
used  within  the  parameter  adaptation  law  to  propagate  sensory  information 
among  the  robots  in  the  network.  Using  a  Lyapunov-like  proof,  we  show  that 
the  control  law  causes  the  network  to  converge  to  a  near-optimal  sensing  config¬ 
uration,  and  if  the  robots’  paths  are  sufficiently  rich,  the  network  will  converge 
to  an  optimal  configuration. 

2 The  phenomenon  of  decentralized  consensus  is  known  by  many  names  including  flocking,  herding, 
swarming,  agreement,  rendezvous,  gossip  algorithms,  and  oscillator  synchronization.  All  of  these 
are,  at  root,  the  same  phenomenon — convergence  of  the  states  of  a  group  of  dynamical  systems  to  a 
common  final  vector  (or  manifold)  through  local  coupling. 
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This  chapter  is  organized  as  follows.  In  Section  4.2  we  state  the  main  assumptions 
and  definitions  to  set  up  the  problem.  Section  4.3  introduces  the  learning  controller 
in  its  simplest  form,  in  which  each  robot  learns  an  approximation  of  the  sensor  func¬ 
tion  independent  from  the  other  robots.  The  learning  controller  is  refined  in  Section 
4.4  by  coupling  the  learning  between  neighboring  robots  using  a  consensus  term  in 
the  parameter  adaptation  law.  Section  4.5  generalizes  the  controller  to  the  broadest 
context:  that  of  a  distributed  gradient  controller  whose  cost  function  has  a  linearly 
parameterized  gradient.  We  analyze  the  convergence  rate  of  the  learning  law  in  Sec¬ 
tion  4.6,  and  consider  several  alternative  stable  learning  laws  in  Section  4.7.  Finally, 
numerical  simulations  are  given  in  Section  4.8  and  a  synopsis  is  provide  in  Section 
4.9. 


4.2  Problem  Formulation 


We  will  use  the  Voronoi  based  controller  described  in  Chapter  2.19  as  the  foundation 
for  building  the  adaptive  architecture  and  for  proving  convergence  qualities.  Recall 
the  Voronoi  cost  function 


i  Pn ) 


-5/> 


Pi\\24>{q)  dq, 


(4.1) 


and  recall  the  mass,  first  moment,  and  centroid  of  a  Voronoi  region  V*  are  given  by 


MVi  =  fVi  <f>(q)  dq,  Lv.  =  jv.q<f>{q)  dq,  and  Cv.  =  LvJMVi,  (4.2) 

respectively.  Finally,  recall  the  result  proved  from  Lemma  2.1, 

=  -  /  {q  -  Pi)<t>{q)  dq  =  -MVi{CVi  -  Pi).  (4.3) 

°Pi  JVi 

Equation  (4.3)  implies  that  critical  points  of  H  correspond  to  the  configurations  such 
that  pi  —  Cvi  Vi,  that  is,  each  agent  is  located  at  the  centroid  of  its  Voronoi  region. 
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4.2.1  Assumptions  and  Definitions 

This  brings  us  to  the  concept  of  optimal  coverage  summarized  in  the  following  defi¬ 
nition. 

Definition  4.1  (Optimal  Coverage  Configuration)  A  robot  network  is  said  to 
be  in  a  (locally)  optimal  coverage  configuration  if  every  robot  is  positioned  at  the 
centroid  of  its  Voronoi  region,  Pi  =  Cvt  Vi. 

We  emphasize  again  that  global  optimization  of  (4.1)  is  known  to  be  difficult  (NP- 
hard  for  a  given  discrete  representation  of  <fi(q))  even  in  the  centralized  case  with 
full  information.  Thus  when  we  refer  to  an  optimal  coverage  configuration  we  mean 
a  locally  optimal  one.  Variations  on  the  control  law  which  attempt  to  find  global 
minima  through  exploration  are  discussed  in  [87,92], 

We  also  make  the  assumptions,  already  discussed  in  Chapter  2,  that  the  robots 
have  dynamics 

Pi  =  Ui,  (4.4) 

where  ut  is  the  control  input,  and  that  they  are  able  to  compute  their  own  Voronoi 
cell,  Vl  =  {q\  \\q-Pi\\  <  \\q~Pj ID- 

More  importantly  for  this  chapter,  we  use  a  basis  function  approximation  scheme 
to  learn  the  sensory  function  <j>(q).  Let  1C  :  Q  R™0  be  a  vector  of  bounded,  contin¬ 
uous  basis  functions.  Each  robot  has  these  functions  available  for  computation.  The 
sensory  function  approximation  for  robot  i  is  given  by  4>fiq,t)  =  fC(q)Tdj(t),  where 
Oi(t)  is  a  parameter  vector  that  is  tuned  according  to  an  adaptation  law  which  we  will 
describe  in  Section  4.3.  Figure  4-2  shows  a  graphical  representation  of  this  function 
approximation  scheme.  For  our  analysis,  we  require  that  the  following  assumption 
holds. 

Assumption  4.1  (Matching  Conditions)  There  exists  and  ideal  parameter  vector 
a  £  Rm  such  that 

<p{q)  =  JC(q)Ta ,  (4.5) 
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Mq) 


d>i 


fa  =  K,{q)T  Of 


Figure  4-2:  The  sensory  function  approximation  is  illustrated  in  this  simplified  2-D 
schematic.  The  true  sensory  function  is  represented  by  <h(q)  and  robot  V s  approxi¬ 
mation  of  the  sensory  function  is  4>i{q).  The  basis  function  vector  IC(q)  is  shown  as 
three  Gaussians  (dashed  curves) ,  and  the  parameter  vector  a*  denotes  the  weighting 
of  each  Gaussian. 

and  a  is  unknown  to  the  robots.  Furthermore, 


(4.6) 


a  >  lflmin 


where  amin  6  K>o  is  a  lower  bound  known  by  each  robot. 

Requirements  such  as  Assumption  4.1  are  common  for  adaptive  controllers.  In 
theory,  the  assumption  is  not  limiting  since  any  function  (with  some  smoothness 
requirements)  over  a  bounded  domain  can  be  approximated  arbitrarily  well  by  some 
set  of  basis  functions  [88].  In  practice,  however,  designing  a  suitable  set  of  basis 
functions  requires  application-specific  expertise. 

There  is  a  variety  of  basis  function  families  to  chose  from  for  JC(q).  We  use  Gaus¬ 
sians  in  our  simulations,  but  other  options  include  wavelets,  sigmoids,  and  splines. 
Gaussian  basis  functions  have  a  computational  advantage  over  non-local  basis  func¬ 
tions  because,  in  any  discrete  representation,  they  have  compact  support.  To  compute 
the  value  of  the  network  at  a  location  <pi(q),  or  to  tune  the  weights  of  the  network  a* 
with  new  data,  one  has  only  to  consider  Gaussians  in  a  region  around  the  point  of 
interest. 

Define  the  moment  approximations  using  <i>i(q,t)  as 
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Next,  define  the  parameter  error 


ai(t)  =  aft)  -  a,  (4.8) 

and  notice  the  relation 

~  4>{q)  =  JC(q)Tai(t).  (4.9) 

In  order  to  compress  the  notation,  we  introduce  the  shorthand  /Q (t)  =  K(pfit))  for 
the  value  of  the  basis  function  vector  at  the  position  of  robot  i ,  and  fa(t)  =  p{p,(t)) 
for  the  value  of  the  sensory  function  at  the  position  of  robot  i.  As  previously  stated, 
robot  i  can  measure  fa  with  its  sensors.  We  will  also  commonly  refer  to  quantities 
without  explicitly  writing  their  arguments.  However,  we  may  include  arguments  in 
some  instances  to  avoid  ambiguity. 

The  function  approximation  framework  described  above  brings  us  to  another  con¬ 
cept  of  optimality  for  coverage. 

Definition  4.2  (Near-Optimal  Coverage  Configuration)  A  robot  network  is  said 
to  be  in  a  near-optimal  coverage  configuration  if  each  robot  is  positioned  at  the  esti¬ 
mated  centroid  of  its  Voronoi  region,  pt  =  Cy  Vi. 

Finally,  we  distinguish  between  two  qualities  of  function  approximations. 

Definition  4.3  (Globally  True  Approximation)  A  robot  is  said  to  have  a  glob¬ 
ally  true  (or  just  true)  approximation  of  the  sensory  function  if  its  approximation  is 
equal  to  the  actual  sensory  function  at  every  point  of  its  domain,  fa(q )  =  faq)  'iq  G  Q. 

Definition  4.4  (Locally  True  Approximation)  A  robot  is  said  to  have  a  locally 
true  approximation  of  the  sensory  function  over  a  subset  0  C  Q  if  its  approximation 
is  equal  to  the  true  function  at  every  point  in  the  subset,  fa{q)  =  4>(q)  Vq  G  fi. 

In  light  of  the  above  definitions,  if  the  parameter  error  is  zero,  a.*  =  0,  then  robot  i 
has  a  true  approximation  of  the  sensory  function.  Also,  if  a.j  =  0  Vi,  then  a  near- 
optimal  coverage  configuration  is  also  optimal.  An  overview  of  the  geometrical  objects 
involved  in  our  set-up  is  shown  in  Figure  4-3. 
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Q :  Convex  area 


4>(q) :  Sensory  function 


CVi :  True  centroid 
True  position  error 


Pi :  Robot  location 


Vi :  Voronoi  region 
of  robot 


Cv,-  Estimated 
centroid 


Estimated 
position  error 


Figure  4-3:  A  graphical  overview  of  the  quantities  involved  in  the  controller  and 
environment  is  shown.  The  robots  move  to  cover  a  bounded,  convex  area  Q  their 
positions  are  pi,  and  they  each  have  a  Voronoi  region  V*  with  a  true  centroid  Cv;  and 
an  estimated  centroid  CV4.  The  true  centroid  is  determined  using  a  sensory  function 
<p(q),  which  indicates  the  relative  importance  of  points  q  in  Q.  The  robots  do  not 
know  <f>(q),  so  they  calculate  an  estimated  centroid  using  an  approximation  <fii(q) 
learned  from  sensor  measurements  of  <j>(q)- 


4.3  Decentralized  Adaptive  Control  Law 


We  want  a  controller  to  drive  the  robots  to  an  optimal  configuration,  that  is,  we 
want  to  position  them  at  their  Voronoi  centroids.  We  emphasize  that  it  is  not  easy 
to  position  a  robot  at  its  Voronoi  centroid  because  (1)  the  robot  does  not  know  the 
sensory  function  <(>(q)  which  is  required  to  calculate  its  centroid,  and  (2)  the  centroid 
moves  as  a  nonlinear  function  of  the  robot’s  position.  To  overcome  the  first  problem, 
our  controller  learns  an  approximation  of  the  centroid  on-line.  To  overcome  the 
second  problem,  our  controller  causes  each  robot  to  pursue  its  estimated  centroid. 
We  will  prove  that  the  robots  achieve  a  near-optimal  configuration,  and  that  every 
robot  learns  a  locally  true  approximation  of  the  sensory  function.  Furthermore,  if  a 
robot’s  path  is  sufficiently  rich,  it  achieves  a  globally  true  approximation,  and  if  every 
robots’  path  is  sufficiently  rich,  the  robots  reach  an  optimal  configuration. 
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We  propose  to  use  the  control  law 


Ui  =  K(CVi-Pi)t 


(4.10) 


where  K  is  a  (potentially  time-varying)  uniformly  positive  definite  control  gain  ma¬ 
trix,  which  may  have  a  skew-symmetric  component  to  encourage  exploration  as  in  [92] . 
The  area  Q  is  required  to  be  convex  so  that  the  control  law  is  feasible,  that  is,  the 
robots  never  attempt  to  cross  the  boundaries  of  Q.  Since  Cvt  €  V)  C  Q  and  pi  €  Q, 
by  convexity,  the  segment  connecting  the  two  is  in  Q,  and  the  control  law  is  feasible. 

The  parameters  at  used  to  calculate  Cyt  are  adjusted  according  to  a  set  of  adap¬ 
tation  laws  which  are  introduced  below.  First,  we  define  two  quantities, 

A i{t)  —  [  w(r)Xlj(r)/Cj(r)T  dr,  and  A ,•(£)=  f  cu(r)/Cj(r)^i(r)  dr.  (4.11) 
Jo  Jo 

These  can  be  calculated  differentially  by  robot  i  using  A*  =  a>(f)/Q/Cf  and  \  = 
oj(t)K-i4>i,  with  zero  initial  conditions.  The  function  u>{t)  >  0  determines  a  data 
collection  weighting.  We  require  that  it  is  integrable  (belongs  to  L1),  and  continuous 
(belongs  to  C°).  Define  another  quantity 


F,  = 


fv,  K(q){q  -  Pi)T  dqK  JV{  (q  -  ft)/C(g)T  dq 

JVi<t>i(q)dq 


(4.12) 


Notice  that  Ft  is  a  positive  semi-definite  matrix.  It  can  also  be  computed  by  robot  i 
as  it  does  not  require  any  knowledge  of  the  true  parameter  vector,  a.  The  adaptation 
law  for  F  is  now  defined  as 


^pre,;  —  Fj  (i  i  y(A  jhj  A, ) ,  (4.13) 

The  two  terms  in  (4.13)  have  an  intuitive  interpretation.  The  first  term  compensates 
for  uncertainty  in  the  centroid  position  estimate.  The  second  term  carries  out  a 
gradient  descent  to  minimize  the  sensory  function  error  4>i(pi)  integrated  over  time. 
The  gradient  descent  interpretation  is  explored  more  in  Section  4.7.  We  stress  that 
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a  decentralized  implementation  requires  that  each  robot  adapts  its  own  parameter 
vector  using  local  information  available  to  it.  If  one  were  interested,  instead,  in 
designing  a  centralized  adaptation  law,  one  could  simply  use  a  common  parameter 
vector  that  is  adapted  using  the  information  from  all  robots. 

Equation  (4.13)  is  the  main  adaptation  law,  however  the  controller  (4.10)  has  a 

A  A 

singularity  at  o*  =  0  (since  Myt  is  in  the  denominator  of  Cyfi-  For  this  reason  we 
prevent  the  parameters  from  dropping  below  am\n  >  0  using  a  parameter  projection 
[101] 


r(hprei  ^proji®prei))  (4.14) 

where  T  €  RTOXm  is  a  diagonal,  positive  definite  adaptation  gain  matrix,  and  the 
diagonal  matrix  Iprojt  is  defined  element-wise  as 

10  for  di(j)  >  amin 

0  for  di{j)  =  amin  and  apie.(j)  >  0  (4.15) 

1  otherwise, 

where  (j)  denotes  the  jth  element  for  a  vector  and  the  jth  diagonal  element  for  a 
matrix. 

The  controller  described  above  will  be  referred  to  as  the  basic  controller,  and  its 
behavior  is  formalized  in  the  following  theorem. 

Theorem  4.1  (Basic  Convergence)  Under  Assumption  4-1,  the  network  of  robots 
with  dynamics  (4-4)>  control  law  (4-10),  and  adaptation  law  (4-13,4-14)  converges  to 
a  near-optimal  coverage  configuration.  Furthermore,  each  robot  converges  to  a  locally 
true  approximation  of  the  sensory  function  over  the  set  fi*  =  {pfir)  \  r  >  0,u;(r)  > 
0},  made  up  of  all  points  on  the  robot’s  trajectory  with  positive  weighting. 

Proof  4.1  We  will  define  a  lower-bounded,  Lyapunov-like  function  and  show  that  its 
time  derivative  is  non-increasing.  This  will  imply  that  it  reaches  a  limit.  Further¬ 
more,  the  time  derivative  is  uniformly  continuous,  so  by  Barbalat’s  lemma 3  [5,80]  it 
3We  cannot  use  the  more  typical  LaSalle  invariance  theorem  because  our  system  is  time- varying 
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approaches  zero.  The  quantities  || Cyft)  —pi(t)\\  and  w(r)^,(pj(r),^)2 ,  0  >  r  >t,  will 
be  included  in  the  time  derivative  of  this  function,  thereby  implying  Pi(t)  —+  Cyft), 
and  4>i(q,t )  — >  <f>{q )  Vg  e  Q*  for  all  i. 


Define  a  Lyapunov-like  function 

V^n  +  f^hjT-%,  (4.16) 

i= 1  ^ 

which  incorporates  the  sensing  cost  H,  and  is  quadratic  in  the  parameter  errors  a<. 
Note  that  the  sensing  cost  H  is  computed  with  the  actual  sensory  function  <j>{q),  so 
it  inherently  incorporates  function  approximation  errors  as  well.  V  is  bounded  below 
by  zero  since  H  is  a  sum  of  integrals  of  strictly  positive  functions,  and  the  quadratic 
parameter  error  terms  are  each  bounded  below  by  zero. 


Taking  the  time  derivative,  of  V  along  the  trajectories  of  the  system  gives 

n  r mr 


v  =  £ 


i= 1 


.  •—  ’Y’  T~' —  1  ~ 

—  Pi  +  a  ,  r  (7: 
opt 


and  substituting  from  (4-3)  and  noticing  that  a«  =  a»  yields 


V 


(4.17) 


(4.18) 


nr  /. 

=  y]  -  /  (?  -  Pif<t>{q)  dqpi  + 

i=  1  L  Jv‘ 

Using  (4-9)  to  substitute,  for  <f>(q)  gives 

V=T  -  (q-  Pi)Td>i  dqp,  +  /  afJC(q)(q  -  Pif  dqpi  +  a]  r_1a,; 

I!?  L  JVi  JVi 

Substituting  for  pi  with  (4-4)  and  (4-10),  and  moving  a ,  out  of  the  second,  integral 


due  to  the  data  weighting  function  w(t). 
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(since  it  is  not  a  function  of  q)  leads  to 

n 

V  =  E  [  "  -  PifK(CVi  -  Pi) 

i= 1 

+af  [  JC(q)(q~Pi)T dq(CVi-Pi) +  ajF-1ai  . 

JVi 

Expanding  ( CV;  —  ft)  in  the  second  term,  and  substituting  for  o<  with  (4-14)  gives 

n 

V  =  E  [  ~  Mvi(CVi  ~  Pi)T K{CVi  -  p*)  +  aj  FA  -  aj  FA 

i=  1  L 

^i)  ®i  dproj^prg.  . 

Now  we  can  expand  {AA  —  Xf),  noting  that  X *  ==  w(r)/Cj(/Cfa)  dr,  to  get 

n 

v = -  E  -  p<)r#  (cv,  - 
!=1  L 

+af 7  /  a ; (rj/Q/Cf a* (t)  dr  +  of ] , 

Jo 

and,  finally,  bringing  af  inside  the  integral  ( it  is  not  a  function  of  r,  though  it  is  a 
function  oft )  results  in 

n 

v  =  -  E  |A  -  ft  )T*(CW  -  ft) 

t=l  L 

+7  [  Jj{^){A{T)Tdi{t))2  dr +  d^Iproj.dPreX  (4.19) 

do  J 

Inside  the  sum,  the  first  and  second  terms  are  clearly  non-negative.  We  focus  mo¬ 
mentarily  on  the  third  term.  Expanding  it  as  a  sum  of  scalar  terms,  we  see  that  the 
jth  scalar  term  is  of  the  form 

hi  ( j )  Iproji  ( j )  Ojprei  ( j )  •  (4.20) 

From  (4-15),  if  a*(j)  >  amin,  or  o*(j)  =  amin  and  apre.(j)  >  0,  then  JproA(j)  =  0 
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and  the  term  vanishes.  Now,  in  the  case  at(j )  =  am jn  and  apre.(j )  <  0,  we  have 
afij)  =  &i(j)  —  a(j )  <  0  (from  Assumption  4-1).  Furthermore,  Iproifij)  =  1  am ^ 
apre.(j)  <  0  implies  that  the  term,  is  non-negative.  In  all  cases,  then,  each  term  of 
the  form  (4-20)  is  non-negative,  and  all  three  terms  inside  the  sum  in  (4-19)  are 
non-negative.  Thus  V  <  0. 

We  have  that  V  is  lower  bounded  and  V  <  0,  so  V  approaches  a  limit.  We  establish 
the  uniform  continuity  ofV  in  Lemma  A.l  in  Appendix  A,  so  by  Barbalat’s  lemma 
lim^oc  V  =  0.  From  ( 4.19 ),  this  implies  \\pi(t)  -  Cy(f)||  =  0  Mi  from  the  first 

term  in  the  sum,  so  the  network  converges  to  a  near-optimal  coverage  configuration. 

Furthermore,  from  /Ci(r)Tai(t)  =  -  <£(p*(t)),  we  have  from  the  second 

term  of (4- 19) 

lim  [  u}(r)(Xi(Pi{r),t)  -  (pipfr)))2  dr  =  0  Vi  =  l,...,n.  (4.21) 

t_>oc  Jo 

Now  notice  that  the  integrand  in  (4-21)  is  non-negative,  therefore  it  must  converge 
to  zero  for  all  r  except  on  a  set  of  Lesbegue  measure  zero.  Suppose  the  integrand  is 
greater  than  zero  at  some  point  r.  The  integrand  is  continuous  (since  /Q (t),  a.i(t),  and 
4>i{t)  are),  so  if  it  is  greater  than  zero  at  t,  it  is  greater  than  zero  in  a  neighborhood 
of  non- zero  measure  around  it,  (r  —  e,  r  +  e),  for  some  t  >  0,  which  is  a  contradiction. 
Thus,  we  have  < pi(q,t )  — *  <p(q)  Vg  €  f L  and  Vi. 

In  [92]  the  following  extension  to  the  above  theorem  was  derived.  We  restate  it 
here  to  give  a  more  thorough  characterization  the  controller’s  behavior. 

Corollary  4.1  (Sufficient  Richness  for  Basic  Controller)  In  addition  to  the  con¬ 
ditions  for  Theorem  4-1,  if  the  robots’  paths  are  such  that  the  matrix  lim^oc  A,;(f)  is 
positive  definite  Mi,  the  network  converges  to  an  optimal  coverage  configuration,  and 
each  robot  converges  to  a  globally  true  approximation  of  the  sensory  function,  <p{q). 

Proof  4.2  Consider  the  second  term  in  (4-19).  Move  the  two  djft)  outside  of  the 
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integral  ( since  they  are  not  a  function  of  r)  to  get 


w(T)/Q/Cf  dr 


di(t)  =  ryai(t)T Ai(t)ai(t). 


Since  V  — *■  0,  if  lim^oo  Aj(i)  is  positive  definite  (we  know  the  limit  exists  because 
JC(q)  is  bounded  and  u>(t)  €  L1),  then  di(t)  — »■  0.  77ms  implies  that  robot  i  con¬ 
verges  to  a  globally  true  approximation  of  the  sensory  function,  4>{q) .  Furthermore, 
if  lim^oo  A fit)  >  0  Vi,  then  Cvi  =  Cv,  Vi,  so  the  network  converges  to  an  optimal 
coverage  configuration. 


Remark  4.1  One  may  wonder  how  the  controller  will  behave  if  Assumption  4-1  fails, 
so  that  there  is  no  ideal  parameter  vector  a  that  will  exactly  reconstruct  f>{q)  from 
the  basis  functions.  Indeed,  this  will  be  the  case  in  any  real-world  scenario.  Such 
a  question  requires  a  robustness  analysis  that  is  beyond  the  scope  of  this  thesis,  but 
analyses  of  robustness  for  centralized  adaptive  controllers  can  be  found,  for  example,  in 
[88]  and  most  texts  on  adaptive  control  (e.g.  [67,89,103]).  It  is  observed  in  numerical 
simulations  that  the  adaptation  law  finds  a  parameter  to  make  4>fiq)  as  close  as  possible 
to  4>{q),  where  closeness  is  measured  by  the  integral  of  the  squared  difference,  as 
described  in  Section  4-7. 

Remark  4.2  One  may  also  wonder  how  the  controller  behaves  with  time  varying  sen¬ 
sory  functions  4>(q,t).  It  can  be  expected  from  existing  results  for  centralized  adaptive 
controllers,  that  our  controller  will  track  sensory  functions  that  change  slowly  with 
respect  to  the  rate  of  adaptation  of  the  parameters.  The  ability  to  track  a  time  varying 
sensory  function  can  be  enhanced  by  using  a  forgetting  factor  in  the  data  weighting 
function  w{t)  as  described  in  Section  4-7.3. 


4.4  Parameter  Consensus 

In  this  section  we  use  the  properties  of  graph  Laplacians  from  Section  2.3.2  to  prove 
convergence  and  consensus  of  a  modified  adaptive  control  law.  The  controller  from 
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(4.3)  is  modified  so  that,  the  adaptation  laws  among  Voronoi  neighbors  are  coupled 
with  a  weighting  proportional  to  the  length  of  their  shared  Voronoi  edge.  Adaptation 
and  consensus  were  also  combined  in  [68]  and  [112],  however  in  those  works  consensus 
was  used  to  align  the  velocities  of  agents,  not  to  help  in  the  parameter  adaptation 
process  itself.  Our  use  of  consensus  is  more  related  to  the  recent  algorithms  for 
distributed  filtering  described  in  [62]  and  [117]. 


4.4.1  Consensus  Learning  Law 

We  add  a  term  to  the  parameter  adaptation  law  in  (4.13)  to  couple  the  adaptation 
of  parameters  between  neighboring  agents.  Let  the  new  adaptation  law  be  given  by 

n 

hpr eY  =  Fihi  —  y  (AjSj  A;.)  C  ^  dj)>  (4.22) 

j= i 


where  wi:i  is  a  weighting  over  the  Delaunay  graph  (see  Section  2.3.2)  between  two 
robots  i  and  j  and  (  €  K,  (  >  0,  is  a  positive  gain.  The  projection  remains  the  same 
as  in  (4.14),  namely 

Oi  —  r(apre.  IprojjOp  re;). 

A  number  of  different  weightings  Wy  are  conceivable,  but  here  we  propose  that  Wjj 
be  equal  to  the  length  (area  for  N  =  3,  or  volume  for  N  >  3)  of  the  shared  Voronoi 
edge  of  robots  i  and  j, 


Wij  — 


(4.23) 


Notice  that  wtJ  >  0  and  t%  =  0  if  and  only  if  i  and  j  are  not  Voronoi  neighbors, 
so  Wjj  is  a  valid  weighting  over  the  Delaunay  communication  graph  as  described  in 
Section  2.3.2.  This  weighting  is  natural  since  one  would  want  a  robot  to  be  influenced 
by  its  neighbor  in  proportion  to  its  neighbor’s  proximity.  This  form  of  wy  will  also 
provide  for  a  simple  analysis  since  it  maintains  the  continuity  of  the  right  hand  side 
of  (4.22),  which  is  required  for  using  Barbalat’s  lemma. 
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Theorem  4.2  (Convergence  with  Parameter  Consensus)  Under  the  conditions 
of  Theorem  4.1,  using  the  parameter  adaptation  law  (4-22),  the  network  of  robots  con¬ 
verge  to  a  near-optimal  coverage  configuration.  Furthermore,  each  robot  converges  to 
a  locally  true  approximation  of  the  sensory  function  over  the  set  all  points  on  every 
robot’s  trajectory  with  positive  weighting,  fi  —  U”=1f lj.  Additionally, 

lim(aj  -  a,)  =  0  ,n}.  (4.24) 


Proof  4.3  We  will  use  the  same  method  as  in  the  proof  of  Theorem  j.l,  adding  the 
extra  term  for  parameter  coupling.  It  will  be  shown  that  this  term  is  non-positive. 
The  claims  of  the  proof  follow  as  before  from  Barbalat’s  lemma. 


Define  V  to  be  (4-16),  which  leads  to 


V=-E  \MVi(CVi  ~  Pi)TK(CVi  -  Vi)  +  7  f  a;(r)(/Ci(r)rai(t))2  dr 

i= 1  L 


n  n 


+®j  IpTojfyrei  ^  1  C  ^  ]  wij (fli  aj)-  (4.25) 

*= 1  3=1 


We  have  already  shown  that  the  three  terms  inside  the  first  sum  are  non-negative. 
Now  consider  the  parameter  coupling  term.  We  can  rewrite  this  term  using  the  graph 
Laplacian  defined  in  Section  2.3.2  as 

n  n  m 

E^E^'^  “  %)  ^  CE VjLatj,  (4.26) 

i=i  j= i  j— i 


where  aj  =  a(j)  1,  ctj  =  [di(j)  ■■■  an(j)]T ,  and  &j  =  ctj  —  aj.  Recall  the  ideal 
parameter  vector  a  =  [a(l)  •  •  •  a(j)  ■  ■  ■  a(m)]T ,  and  the  parameter  estimate 
for  each  agent 

=  [oj(l)  •  •  •  <k(j)  •  •  •  di(m)}T.  We  have  simply  regrouped  the  parameters  by 

introducing  the  aj  notation.  From  Section  2.3.2  we  saw  that  ajL  —  a(j)lTL  =  0. 
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This  gives 


m  m 

c  J2  L®j  =  c  &TjL&i  -  °>  (4.27) 

j= i  j=i 

since  L  >  0.  Thus  V  <  0. 

Lemma  A. 2  establishes  the  uniform  continuity  of  V  for  this  controller.  We  can 
therefore  use  Barbalat’s  lemma  to  conclude  that  V  — »  0,  As  before  this  implies  the 
two  claims  of  Theorem  f.l .  Since  the  graph  Laplacian  is  positive  semi- definite,  and 
O’i(j)  A  Qrnin -  limt_3c  Oj  Lcij  =  0  =$■  ocj  &  final (j )  1  Vj  (E  {1, . . . ,  m},  where 

cifinai  €  is  some  undetermined  vector,  which  is  the  common  final  value  of  the 
parameters  for  all  of  the  agents.  The  consensus  assertion  (4-24)  follows. 

Finally,  recall  the  fact  that  for  robot  j,  fa(q)  — >  <j>(q)  over  Llj,  but  fa  —*  hj, 
therefore  fa(q)  — >  faq )  over  Llj.  This  is  true  for  all  robots  i  and  j,  therefore  fa  — >  <p(q) 
over  il  =  Uj=1i2j  for  all  i. 

Corollary  4.2  (Sufficient  Richness  for  Consensus  Controller)  In  addition  to 
the  conditions  for  Theorem  4-2,  if  the  robots’  paths  are  such  that  fQ)C(q)JC(q)Tdq 
is  positive  definite,  the  network  converges  to  an  optimal  coverage  configuration,  and 
each  robot  converges  to  a  globally  true  approximation  of  the  sensory  function,  <b(q) . 

Proof  4.4  Since  fa(q,t)  — >  faq)  over  Q,  we  have  fa(oo)TlC(q)lC(q)Tfa(oc)  =  0  over 
it,  where  fa  (ex)  is  shorthand  for  fa(t) .  Then 

0  =  f  fa(x)T  tC(q)K,(q)T  fa(x)dq  —  fa(x)J  j  JC(q)JC(q)Tdqfa( oo)  (4.28) 
J  n  Jn 

Therefore  if  fn  JC(q)tC(q)T dq  >  ()>  then  fa(a o)  ==  0.  This  is  true  for  all  i. 

Remark  4.3  The  condition  of  Corollary  4-2  is  less  strict  than  that  of  Corollary  4-1 
because  only  the  union  of  all  the  robots’  paths  has  to  be  sufficiently  rich,  not  each 
path  individually.  This  means  it  is  easier  to  achieve  an  optimal  configuration  with 
the  consensus  controller. 
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Remark  4.4  Another  commonly  used  weighting  for  algorithms  over  communication 
graphs  is 


Wij  — 


1 

0 


for  j  G  Mi 
for  j  £  Mh 


where  Mi  is  the  set  of  indices  of  neighbors  of  i,  as  was  proposed  in  [100].  In  this 
case,  stability  can  be  proved,  but  with  considerable  complication  in  the  analysis,  since 
V  is  not  continuous.  Even  so,  recent  extensions  of  Barbalat’s  lemma  to  differential 
inclusions  from  [61,86]  (and  applied  to  flocking  systems  in  [104])  can  be  used  to  prove 
the  same  result  as  in  Theorem.  4-2. 


Remark  4.5  Introducing  parameter  coupling  increases  parameter  convergence  rates 
and  makes  the  controller  equations  better  conditioned  for  numerical  integration,  as 
will  be  discussed  in  Section  4-8.  However  there  is  a  cost  in  increased  communication 
overhead.  In  a  discrete-time  implementation  of  the  controller  in  which  parameters 
and  robot  positions  are  represented  finitely  with  b  bits,  a  robot  will  have  to  transmit 
(m+2)6  bits  and  receive  \Mi\(m+2)b  bits  per  time  step.  While  for  the  basic  controller, 
each  robot  must  transmit  2b  and  receive  2  \M{  \  b  bits  per  time  step.  This  may  or  may 
not  represent  a  significant  communication  overhead,  depending  upon  b  and  the  speed 
of  the  control  loop.  In  hardware  experiments  we  have  found  this  to  be  a  negligible 
communication  cost.  Note  that  although  discretization  is  necessary  for  a  practical 
implementation,  it  does  not  affect  the  essential  phenomenon  of  consensus,  as  shown 
in  [33,48]- 


4.5  Adaptive  Gradient  Controller 

The  parameter  adaptation  architecture  developed  thus  far  for  the  Voronoi  controller 
can  be  analogously  constructed  for  any  distributed  gradient  controller  with  an  un¬ 
known  cost  function  if  the  gradient  of  that  cost  function  can  be  linearly  parameterized. 
Specifically,  the  distributed  gradient  controllers  from  Chapter  3  can  all  support  adap¬ 
tation  in  the  same  way  as  the  Voronoi  controller.  In  this  section  we  summarize  the 
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most  general  case,  which  can  then  be  specialized  to  a  specific  gradient  controller.  As 
before,  let  H{P)  be  the  unknown  cost  of  a  network  of  robots  and  let  its  gradient  be 
linearly  parameterized  by 


&H 

dpi 


=  Ki(P)1  a, 

p 


(4.29) 


where  k,:  :  Vn  i— >  Rmxd  is  known  to  agent  p%  but  a  is  unknown.  Also,  suppose 
each  robot  has  a  sensor  with  which  it  can  measure  dH/dpi  |  py)  at  the  current  robot 
configuration  P(t),  and  let  the  robots  communicate  over  a  communication  graph  with 
a  weighting  function  wy.  The  Voronoi  controller  fits  this  description  with 


Ki(P)  =  -f  K{q){q  -  Pif  dq.  (4.30) 

JvdP) 

Let  the  robots’  dynamics  be  given  by 


Pi  AT,  (P) 


(4.31) 


and  the  adaptation  of  robot  z’s  estimated  parameters  be  given  by 


^(P)A’«j(P)Taj 


n 

+  7  (Aifij  +  Aj)  +  (  Wij(di  — 
j= i 


(4.32) 


where 


and 


tc(r)K.i(P(r))Ki(P(r))T  dr 


(4.33) 


A  i(t) 


r  ,  ,en 

/  w  T) 

Jo  dPi 


dr. 


P(r) 


(4.34) 


To  be  precise,  A,  and  A t  as  defined  here  are  slightly  different  from  their  definition  for 
the  Voronoi  controller.  Convergence  and  consensus  results  analogous  to  Theorem  4.2 
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and  Corollary  4.2  follow  directly  for  this  controller  using  the  same  proof  arguments. 


4.6  Parameter  Convergence  Analysis 

As  a  separate  matter  from  the  asymptotic  convergence  in  Theorem  4.1  and  Theorem 
4.2,  one  may  wonder  how  quickly  parameters  converge  to  their  final  values.  In  this 
section  we  show  that  parameter  convergence  is  not  exponential,  though  given  suffi¬ 
ciently  rich  trajectories  it  can  be  shown  to  converge  exponentially  to  an  arbitrarily 
small  error.  The  rate  of  this  convergence  is  shown  to  be  faster  for  the  controller  with 
parameter  consensus  than  for  the  basic  controller.  We  neglect  the  projection  opera¬ 
tion,  as  the  non-smooth  switching  considerably  complicates  the  convergence  analysis. 

From  (4.13)  and  (4.14),  neglecting  the  projection,  but  including  the  adaptation 
gain  matrix  T,  we  have 

Oi  =  -r (FA  +  7(A A  -  A*)),  (4.35) 

which  can  be  written  as 


Ui  FyA  TTjdj, 


(4.36) 


leading  to 


d ||  7dfrAj(t)dj  afTFidi 

H|  Hi 


(4.37) 


Let  Aminat)  >  0  be  the  minimum  eigenvalue  of  TAj(t)  (we  know  it  is  real- valued  and 
non-negative  since  A *(£)  is  symmetric  positive  semi-definite).  Then  we  have 


d_ 

dt 


HI  ^  -7Ammi(i)||d*||  +  ||LFjdj||. 


(4.38) 


Now  consider  the  signal  ||rF<dt||.  We  proved  in  Theorem  4.1  that  || Cyi  —  pj||  — >•  0 
and  all  other  quantities  in  TFidj  are  bounded  for  all  i,  therefore  ||rFjdj||  — >  0.  Also, 
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Amini(O)  =  0,  and  AmiIli  (t)  is  a  nondecreasing  function  of  time.  Suppose  at  some 
time  T,  robot  i  has  a  sufficiently  rich  trajectory  (so  that  A,(T)  is  positive  definite, 
as  in  Corollary  4.1),  then  Araini(f)  >  Amin>:(T)  >  0  Vi  >  T.  Then  from  (4.38),  ||dj|j 
will  decay  faster  than  an  exponentially  stable  first  order  system  driven  by  ||rF)dj||. 
Finally,  the  gains  F  and  7  can  be  set  so  that  ||rFjOj||  is  arbitrarily  small  compared 
to  7Amin,  without  affecting  stability.  Thus,  if  the  robot’s  trajectory  is  sufficiently 
rich,  exponentially  fast  convergence  to  an  arbitrarily  small  parameter  error  can  be 
achieved. 


Now  we  consider  a  similar  rate  analysis  for  the  controller  with  parameter  consen¬ 
sus.  In  this  case,  because  the  parameters  are  coupled  among  robots,  we  must  consider 
the  evolution  of  all  the  robots’  parameters  together.  Let 

A  =  [%  dTnf.  (4.39) 

be  a  concatenated  vector  consisting  of  all  the  robots’  parameter  errors.  Also,  define 
the  block  diagonal  matrices  F  =  diag'L^Fiy),  A  =  diag”=i(FA,:),  and  the  generalized 
graph  Laplacian  matrix 

r(l)L(l,l)/m  L(l,n)Im 

£  =  ;  ••.  :  .  (4.40) 

L(n,  l)Im  ■■■  T(n)L(n,n)Im 

The  eigenvalues  of  £  are  the  same  as  those  of  FL,  but  each  eigenvalue  has  multiplicity 
m.  As  for  a  typical  graph  Laplacian,  £  is  positive  semi-definite.  The  coupled  dynamics 
of  the  parameters  over  the  network  can  be  written 

A  =  —(7 A  +  (£)A  -  FA,  (4.41) 

with  A  defined  in  the  obvious  way.  Notice  the  similarity  in  form  between  (4.36)  and 


91 


(4.41).  Following  the  same  type  of  derivation  as  before  we  find 

|||i||<-Amin(t)||i||  +  ||Fi||,  (4.42) 

where  Amin(i)  >  0  is  the  minimum  eigenvalue  of  7A(i)  +  ££(i).  Again,  it  is  real-valued 
and  non-negative  since  7A (t)  +  ££(i)  is  symmetric  positive  semi-definite. 

As  before,  the  signal  ||iA4||  — >  0.  If  after  some  time  T,  mineig(A(T))  >  0  then 
Amin (i)  >  mineig(A(t))  >  0  Vi  >  T  and  the  network’s  trajectory  is  sufficiently  rich. 
Then  from  (4.37),  ||A||  will  decay  at  least  as  fast  as  an  exponentially  stable  first  order 
system  driven  by  ||iA4||.  Finally,  the  gains  T,  7,  and  £  can  be  set  so  that  ||FA||  is 
arbitrarily  small  compared  to  7A(t)  +  ££(i)  without  affecting  stability.  Thus,  if  the 
robot  network’s  trajectory  is  sufficiently  rich,  exponentially  fast  convergence  to  an 
arbitrarily  small  parameter  error  can  be  achieved  for  the  whole  network. 

To  compare  with  the  performance  of  the  basic  controller  consider  that  7 A  (t)  < 
7A (i)  +  ££(i).  Therefore  the  minimum  eigenvalue  for  the  consensus  controller  is 
always  at  least  as  large  as  that  for  the  basic  controller  implying  convergence  is  at 
least  as  fast.  In  practice,  as  we  will  see  in  Section  4.8,  parameter  convergence  is 
orders  of  magnitude  faster  for  the  consensus  controller. 


4.7  Alternative  Learning  Laws 

The  adaptation  law  for  parameter  tuning  (4.13)  can  be  written  more  generally  as 

&i  =  TjUj  +  fiijPi,  V%i  Uj,  t),  (4.43) 

where  we  have  dropped  the  projection  operation  for  clarity.  There  is  considerable 
freedom  in  choosing  the  learning  function  /j(-).  We  are  constrained  only  by  our 
ability  to  find  a  suitable  Lyapunov-like  function  to  accommodate  Barbalat’s  lemma. 
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4.7.1  Gradient  Laws 


The  form  of  /,;(•)  chosen  in  Section  4.3  can  be  called  a  gradient  law,  since 


d 

da.i 


(4.44) 


The  parameter  vector  follows  the  negative  gradient  of  the  Least  Squares  cost  function, 
seeking  a  minimum. 


Another  possible  learning  law  is  to  follow'  the  gradient,  given  by 


fi 


d_ 

d&i 


]puj(r){4>i  -  4>i)2 


-7w(t)/ci(x:fdi  -  4>i). 


(4.45) 


Using  the  same  Lyapunov  function  as  before,  it  can  be  verified  that  this  learning  law 
results  in  a  near-optimal  coverage  configuration. 


These  two  gradient  laws  can  be  combined  to  give 

fi  =  —7  Cj —  (pi)  +  (A jSj  —  A*)]  ,  (4.46) 

which  is,  in  fact,  equivalent  to  the  first  law  with  a  weighting  function  wc(t,r )  = 
S(t  —  r)w(f)  +  a>(r),  wdiere  S(t  —  r)  is  the  delta-Dirac  function  (we  can  make  u>(-)  a 
function  of  t,  and  r  with  minimal  consequences  to  the  convergence  proof).  The  same 
Lyapunov-like  function  can  be  used,  such  that  the  resulting  time  derivative  is 

n 

V  =  —  jiliy;  (CVj  "  Pi)TK(CVl  —  Pi)  +  af  Iproj,  Opre.;  + 
i=  1 

7 aj  [io(t)JCilCj  +  A i]  at  , 

leading  to  the  same  convergence  claims  as  in  Theorem  4.1  and  Corollary  4.1. 
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4.7.2  Recursive  Least  Squares  Laws 


Another  interesting  possibility  for  a  learning  law  is  the  continuous-time  Recursive 
Least  Squares  method.  This  law  can  be  interpreted  as  continuously  solving  the  Least 
Squares  minimization  problem  recursively  as  new  data  is  acquired.  Let 

J  =  \  J  w(r)(<&  -  dr  (4.47) 

be  the  standard  Least  Squares  cost  function  with  a  data  weighting  function  u(r). 
Then,  taking  the  gradient  with  respect  to  d*  and  setting  to  zero  we  find 


Ai(t)ai  =  Xi(t).  (4.48) 

If  the  matrix  A ,(t)  is  full  rank,  we  can  pre-multiply  both  sides  by  its  inverse  to  solve 
the  Least  Squares  problem.  However,  we  seek  a  recursive  expression,  so  taking  the 
time  derivative  we  obtain 


at  = -Pi (t)cj(t)/Ci (/Cf  &i  -  fa),  where  Pi (t)  =  Ai(t)  L  (4.49) 

Using  an  identity  from  vector  calculus,  P,  can  be  computed  differentially  by  P,  = 
—PiLo(t)JCilCf  Pi,  but  the  initial  conditions  are  ill  defined.  Instead,  we  must  use  some 
nonzero  initial  condition,  Pi0,  with  the  differential  equation  Pj  =  —  P,;u;(t)/Ci/CfPj,  to 
give  the  approximation 

Pi  =  A"1  +  Pi0.  (4.50) 

The  initial  condition  can  be  interpreted  as  the  inverse  covariance  of  our  prior  knowl¬ 
edge  of  the  parameter  values.  We  should  choose  this  to  be  small  if  we  have  no  idea 
of  the  ideal  parameter  values  when  setting  initial  conditions. 

Before  we  can  apply  the  Recursive  Least  Squares  law  to  our  controller,  there  is 
one  additional  complication  that  must  be  dealt  with.  We  can  no  longer  use  the  same 
projection  operator  to  prevent  the  singularity  when  MVi  =  0.  However,  it  is  possible 
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to  formulate  a  different  stable  controller  that  eliminates  this  singularity  altogether. 
This  formulation  also  has  the  advantage  that  it  no  longer  requires  a(j)  >  amin  Vj  in 
Assumption  4.1.  We  can  use  the  controller 


Ui  =  K{Lvi  —  MvjPi ), 


(4.51) 


with  the  adaptation  law 


=  -Pi 


MViFidii  +  -  <pi) 


(4.52) 


to  approximate  the  Recursive  Least  Squares  law.  Asymptotic  convergence  can  be 
proven  for  this  case  by  using  the  Lyapunov  function 

n  1 

V  =  H  +  J>-aprV,  (4.53) 

i=  1  ^ 


which  leads  to 


V 


E 

i=  1 


kMy 


(CVi  -  Pi)TK(CVi  -  Pi)  +  -a)  [uj^KiK? 


(4.54) 


Note  that  the  only  difference  in  the  Lyapunov  function  is  that  T  has  been  replaced 
with  the  time- varying  quantity  P,  . 


We  can  also  formulate  a  learning  law  analogous  to  the  combined  gradient  law 
(4.46)  as 

a j  —  —Pi  cii  —  c fa )  +  (Ajd*  —  A,;)^  ,  (4.55) 

with  A i  and  \  defined  as  before.  The  same  Lyapunov  function  can  be  used  (4.53), 
resulting  in 


n 

V  =  -  ^2  ~  Pi)TK(CVi  -  Pi)  +  afAifli 

i=l 
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Interestingly,  the  integral  terms  (those  involving  A*  and  A *)  of  the  learning  law  in 
(4.55)  have  a  gradient  interpretation.  Taking  just  those  terms  we  have 

fi  =  \) 

—  ai  T  PioKai 

=  ~dKi  (2^°*)  +PmAiai’ 

(4.56) 

so  the  law  approximates  the  gradient  of  the  squared  parameter  error.  The  last  term 
on  the  right  hand  side  arises  from  the  mismatch  in  initial  conditions  between  Pi  and 
K. 

The  combination  of  Least  Squares  and  gradient  learning  apparent  in  this  law  is 
quite  similar-  to  the  Composite  Adaptation  described  in  [102, 103].  In  fact,  if  one 
identifies  the  prediction  error  as  JCf&i  —  &  and  the  tracking  error  as  A*  —  A.j<A  we  have 
composite  adaptation  (except,  of  course,  for  the  term  containing  Ft,  which  is  required 
for  the  stability  proof). 

Unfortunately,  it  is  found  that  the  equations  resulting  from  the  Least  Squares 
formulation  are  difficult  to  solve  numerically,  often  causing  robots  to  jump  outside  of 
the  area  Q,  which  then  corrupts  the  Voronoi  calculation.  Alleviating  this  problem  is 
a  matter  of  ongoing  research. 

4.7.3  Data  Weighting  Functions 

The  form  of  the  function  w(-)  can  be  designed  to  encourage  parameter  convergence. 
One  obvious  choice  is  to  make  lo(t)  a  square  wave,  such  that  data  is  not  incorporated 
into  f*uj(T)JCilCj  dr  after  some  fixed  time.  This  can  be  generalized  to  an  exponential 
decay,  w(r)  =  exp(-r),  or  a  decaying  sigmoid  w(r)  =  l/2(erf(c  — i)  + 1).  Many  other 
options  exist. 

One  intuitive  option  for  v(-)  is  u>(t)  =  ||j?i||2,  since  the  rate  at  which  new  data  is 
collected  is  directly  dependent  upon  the  rate  of  travel  of  the  robot.  This  weighting,  in 
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a  sense,  normalizes  the  effects  of  the  rate  of  travel  so  that  all  new  data  is  incorporated 
with  equal  weighting.  Likewise,  when  the  robot  comes  to  a  stop,  the  value  of 
at  the  stopped  position  does  not  overwhelm  the  learning  law.  This  seems  to  make 
good  sense,  but  there  is  an  analytical  technicality:  to  ensure  that  A*  and  A,;  remain 
bounded  we  have  to  prove  that  p,  G  L2.  In  practice,  we  can  set  c o(r)  =  ||j>*||2  up  to 
some  fixed  time,  after  which  it  is  zero. 

We  can  also  set  co(t,r)  —  exp {— (£  —  r)},  which  turns  the  integrators  A*,  Pi,  and 
A,;  into  first  order  systems.  This  essentially  introduces  a  forgetting  factor  into  the 
learning  law  which  has  the  advantage  of  being  able  to  track  slowly  varying  sensory 
distributions.  Forgetting  factors  can  have  other  significant  benefits  such  as  improving 
parameter  convergence  rates  and  allowing  the  flexibility  to  reject  certain  frequencies 
of  noise  in  the  error  signal.  A  thorough  discussion  of  forgetting  factors  can  be  found 
in  [103],  Section  8.7. 


4.8  Numerical  Simulations 


Simulations  were  carried  out  in  a  Matlab  environment.  The  dynamics  in  (4.4)  with 
the  control  law  in  (4.10),  and  the  adaptation  laws  in  (4.14)  (with  (4.13)  for  the  basic 
controller  and  (4.22)  for  the  consensus  controller)  for  a  group  of  n  =  20  robots  were 
integrated  forward  in  time.  A  numerical  solver  with  a  fixed-time-step  of  .01s  was 
used  to  integrate  the  equations.  The  area  Q  was  taken  to  be  the  unit  square.  The 
sensory  function,  was  parameterized  as  a  linear  combination  of  nine  Gaussians. 
In  particular,  for  /C  =  [  /C(l)  •  •  ■  /C(9)  ]T,  each  component,  JC(j),  was  implemented 
as 


m 


1  (q-  Hjf 

exp—  J 


2a] 


(4.57) 


where  a3  =  .18.  The  unit  square  was  divided  into  an  even  3x3  grid  and  each  /c, 
was  chosen  so  that  one  of  the  nine  Gaussians  was  centered  at  the  middle  of  each 
grid  square.  The  parameters  were  chosen  as  a  =  [100  amin  •  •  •  am jn  100]T, 
with  amin  =  .1  so  that  only  the  lower  left  and  upper  right  Gaussians  contributed 
significantly  to  the  value  of  (p{q),  producing  a  bimodal  distribution. 
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The  robots  in  the  network  were  started  from  random  initial  positions.  Each  robot 
used  a  copy  of  the  Gaussians  described  above  for  fC(q).  The  estimated  parameters  a* 
for  each  robot  were  started  at  a  value  of  om in,  and  A*  and  \  were  each  started  at  zero. 
The  gains  used  by  the  robots  were  K  =  3/2)  T  =  I9,  7  =  300  and  C,  =  0  for  the  basic 
controller,  and  7  =  100  and  (  =  50  for  the  consensus  controller.  In  practice,  the  first 
integral  term  in  the  adaptive  law  (4.13)  seems  to  have  little  effect  on  the  performance 
of  the  controller.  Choosing  T  small  and  7  comparatively  large  puts  more  weight  on 
the  second  term,  which  is  responsible  for  integrating  measurements  of  (pipi)  into  the 
parameters.  The  spatial  integrals  in  (4.7)  and  (4.13)  required  for  the  control  law 
were  computed  by  discretizing  each  Voronoi  region  V)  into  a  7  x  7  grid  and  summing 
contributions  of  the  integrand  over  the  grid.  Voronoi  regions  were  computed  using  a 
decentralized  algorithm  similar  to  the  one  in  [26] . 

4.8.1  Simulation  Results 

Figure  4-4  shows  the  positions  of  the  robots  in  the  network  over  the  course  of  a 
simulation  run  for  the  parameter  consensus  controller  (left  column)  and  the  basic 
controller  (right  column).  The  centers  of  the  two  contributing  Gaussian  functions 
are  marked  with  xs.  It  is  apparent  from  the  final  configurations  that  the  consensus 
controller  caused  the  robots  to  group  more  tightly  around  the  Gaussian  peaks  than 
the  basic  controller.  The  somewhat  jagged  trajectories  are  caused  by  the  discrete 
nature  of  the  spatial  integration  procedure  used  to  compute  the  control  law. 

Figure  4-5(a)  shows  that  both  controllers  converge  to  a  near-optimal  configuration — 
one  in  which  every  robot  is  located  at  the  estimated  centroid  of  its  Voronoi  region, 
in  accordance  with  Theorem  4.1.  However,  the  true  position  error  also  converged  to 
zero  for  the  consensus  controller,  indicating  that  it  achieved  an  optimal  coverage  con¬ 
figuration,  as  shown  in  Figure  4-5(b).  The  basic  controller  did  not  reach  an  optimal 
coverage  configuration.  Furthermore,  convergence  was  so  much  faster  for  the  consen¬ 
sus  controller  that  we  have  to  use  a  logarithmic  time  scale  to  display  both  curves  on 
the  same  plot.  Again,  the  somewhat  jagged  time  history  is  a  result  of  the  discretized 
spatial  integral  computation  over  the  Voronoi  region. 
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Figure  4-4:  Simulation  results  for  the  parameter  consensus  controller  are  shown  in 
the  left  column  (4-4(a),  4-4(c),  and  4-4(e)),  and  for  the  basic  controller  in  the  right 
column  (4-4(b),  4-4(d),  and  4-4(f )).  The  Gaussian  centers  of  <p(q)  are  marked  by  the 
red  x’s. 
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(a)  Mean  Estimated  Position  Error  (b)  Mean  True  Position  Error 


Figure  4-5:  The  estimated  position  error,  \\CVi  —  Pi\\,  and  the  true  position  error, 
|| —  Pi||  averaged  over  all  the  robots  in  the  network  is  shown  for  the  network  of 
20  robots  for  both  the  basic  and  parameter  consensus  controllers.  The  true  position 
error  converges  to  zero  only  for  the  parameter  consensus  controller,  4-5  (b).  However, 
in  accordance  with  Theorem  4.1,  the  estimated  error  converges  to  zero  in  both  cases, 
4-5 (a).  Note  the  logarithmic  time  scale. 


The  Figure  4-6(a)  demonstrates  that  a  locally  true  sensory  function  approximation 
is  achieved  for  each  robot  over  —  {pi(r)  |  r  >  0,  cu(r)  >  0},  the  set  of  points 
along  the  robot’s  trajectory  with  positive  weighting.  The  plot  shows  the  integral  in 
(4.21)  as  a  function  of  time  averaged  over  all  the  robots  in  the  network  converging 
asymptotically  to  zero.  The  disagreement  among  the  parameter  values  of  robots  is 
shown  in  the  right  of  Figure  4-6(b).  The  parameters  were  initialized  to  amin  for  all 
robots,  so  this  value  starts  from  zero  in  both  cases.  However,  the  consensus  controller 
causes  the  parameters  to  reach  consensus,  while  for  the  basic  controller  the  parameters 
do  not  converge  to  a  common  value. 

Figure  4-7(a)  shows  that  the  consensus  controller  obtained  a  lower  value  of  the 
Lyapunov  function  at  a  faster  rate  than  the  basic  controller,  indicating  both  a  lower- 
cost  configuration  and  a  better  function  approximation.  In  fact,  Figure  4-7(b)  shows 
that  the  parameter  errors  ||a,||  actually  converged  to  zero  for  the  consensus  controller, 
so  the  conditions  for  Corollary  4.2  were  met.  This  was  also  evidenced  in  Figure  4-5(b) 
since  the  true  position  error  converged  to  zero.  For  the  basic  controller,  on  the  other 
hand,  the  parameters  did  not  converge  to  the  true  parameters. 
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Figure  4-6:  The  integrated  sensory  function  error,  namely  f0  ^(r)(/C,a()2  dr,  averaged 
over  all  the  robots  is  shown  for  the  basic  and  consensus  controllers  in  4-6 (a).  The 
plot  demonstrates  that  each  robot  converges  to  a  locally  true  function  approximation 
over  all  points  along  its  trajectory  with  positive  weighting,  w(r)  >  0,  as  asserted  in 
Theorem  4.1.  The  quantity  Za=i  “s' Zq=i(®»  ~  ®j)  is  shown  in  4-6(b),  representing 
a  measure  of  the  disagreement  of  parameters  among  robots.  The  disagreement  con¬ 
verges  to  zero  for  the  consensus  controller,  as  asserted  in  Theorem  4.2,  but  does  not 
converge  for  the  basic  controller. 

4.9  Synopsis 

In  this  chapter  we  augmented  the  distributed  coverage  controller  from  Chapter  3 
to  including  learning  of  the  sensory  function.  The  learning  controller  was  proven  to 
cause  the  robots  to  move  to  the  estimated  centroids  of  their  Voronoi  regions,  while  also 
causing  their  estimate  of  the  sensory  distribution  to  improve  over  time.  Parameter 
coupling  was  introduced  in  the  adaptation  laws  to  increase  parameter  convergence 
rates  and  cause  the  robots’  parameters  to  achieve  a  common  final  value.  The  control 
law  was  demonstrated  in  numerical  simulations  of  a  group  of  20  robots  sensing  over 
an  area  with  a  bimodal  Gaussian  distribution  of  sensory  information. 
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Figure  4-7:  The  Lyapunov  function  is  shown  in  4-7  (a)  for  both  the  basic  and  param¬ 
eter  consensus  controllers.  Notice  that  the  parameter  consensus  controller  results  in 
a  faster  decrease  and  a  lower  final  value  of  the  function.  The  normed  parameter  error 
|| a*  ||  averaged  over  all  robots  is  shown  in  4-7(b).  The  parameter  error  converges  to  zero 
with  the  consensus  controller  indicating  that  the  robot  trajectories  were  sufficiently 
rich. 
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Chapter  5 


From  Theory  to  Practice: 

Coverage  with  SwarmBots 

5.1  Introduction 

In  Chapter  4  we  introduced  a  distributed,  theoretically-proven  controller  for  a  group 
of  robots  to  provide  sensor  coverage  of  an  environment  while  learning  the  sensory 
function.  In  this  chapter  we  describe  the  algorithmic  and  systems  challenges  we 
solved  to  implement  this  coverage  controller  on  a  group  of  robots.  We  present  results 
of  experiments  with  16  robots.  As  described  in  Chapter  4  and  shown  graphically  in 
Figures  4-1  and  4-2,  we  implement  an  algorithm  in  which  robots  simultaneously  learn 
the  areas  of  the  environment  which  need  to  be  covered,  and  move  to  cover  those  areas. 
The  learning  algorithm  uses  on-line  parameter  adaptation  and  a  consensus  algorithm 
to  approximate  the  sensory  function  from  sensor  measurements. 


5.1.1  Related  Work 

There  is  little  existing  experimental  work  on  multi-robot  coverage  control  using  the 
Voronoi  based  method  aside  from  that  presented  in  this  thesis.  The  first  experi¬ 
mental  results  with  Voronoi  based  coverage  were  obtained  in  [95]  with  a  controller 
that  approximated  the  sensory  function,  though  not  using  learning  as  in  this  the- 
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sis.  Other  experiments  were  carried  out  for  a  time- varying  sensory  function  in  [78] . 
Other  multi-robot  coverage  methods  not  involving  Voronoi  tessellations  have  been 
investigated  experimentally,  however.  For  example,  [23]  used  both  reactive  and  de¬ 
liberative  approaches  to  inspect  turbine  blades  with  multiple  robots.  Preliminary 
experiments  with  one  fixed  and  one  moving  robot  were  described  in  [82],  and  [51] 
describes  multi-robot  experiments  with  a  lawn  mowing-type  coverage  algorithm.  Ex¬ 
periments  of  multiple  robots  covering  an  environment  using  an  exploration  algorithm 
without  localization  information  are  reported  in  [6] . 

5.1.2  Contributions 

The  main  contribution  of  this  chapter  is  as  follows: 

1.  The  controller  from  Chapter  4  with  on-line  learning  of  the  sensory  function  is 
implemented  on  a  group  of  16  SwarmBots.  The  performance  of  the  robot  group 
is  compared  to  that  predicted  by  the  theoretical  results  of  Chapter  4. 

In  Section  5.2  we  translate  the  controller  from  4  into  an  algorithm  that  is  practical 
for  implementation  on  robot  platforms  with  limited  computational  resources.  We  also 
enumerate  the  differences  between  the  practical  algorithm  and  the  idealized  controller. 
In  Section  5.3  we  give  results  of  two  experiments  and  show  experimental  snapshots. 
The  algorithm  is  shown  to  operate  in  realistic  situations  in  the  presence  of  noise  on 
sensor  measurements  and  actuator  outputs.  Conclusions  and  discussion  are  in  Section 
5.4. 

5.2  Coverage  Control  Algorithm 

The  Coverage  control  algorithm  has  two  components,  corresponding  to  the  two  spaces 
described  in  Figure  4-1.  In  position  space,  the  robots  pursue  their  estimated  centroids, 
given  by 


Pi(t+  1)  =  CvM- 


(5.1) 
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The  estimated  centroid  of  the  Voronoi  region  is  its  geometric  center,  weighted  by 
the  sensory  function  approximation.  We  calculate  the  discrete  approximation  of  the 
centroid  of  V,-  by  dividing  it  up  into  a  set  of  grid  squares.  Let  the  set  of  center  points 
of  the  grid  squares  be  V,  and  each  grid  square  has  equal  area  A q.  Then  the  estimated 
centroid  C'y,  of  Vi,  weighted  by  <pi{q,t ),  is  given  by 

Cv.(t)  =  (5.2) 

where  4>i{q,t )  is  defined  by 

4>i(q,t)  =  K.(q)Tdi(t),  (5.3) 

as  in  Chapter  4,  and  a,(f)  is  the  estimated  parameter  vector  of  robot  i. 

In  parameter  space,  the  robots  collaboratively  learn  the  function  </%).  They  do 
this  by  iteratively  integrating  the  values  of  <p(pi)  into  the  quantity  A ,;(f).  They  also 
integrate  the  value  of  the  basis  function  vector  at  their  position  IC(pi(t ))  into  the 
quantity  A j(t).  Specifically, 


Aj(t  +  1)  =  A,;(f)  +  fC(pi(t))<p(pi(t))  and,  (5.4) 

Aj(t  +  1)  =  Ai(t)  +  JC(pi(t))K.(pi{t))T .  (5.5) 


Here  for  simplicity  we  use  a  uniform  time  weighting  function,  t o{t)  —  1.  Each  robot 
then  tunes  its  parameter  vector  using 

aipie(t)  =  a,i(t)  +  p(A i(t)  -  A i(t)ai(t))  +  C  ^  (%(0  -  a^t)).  (5.6) 

where  7  and  C  are  positive  gains.  We  do  not  use  the  length  of  the  shared  Voronoi  face 
as  a  weighting  in  the  parameter  tuning  (5.6).  Instead  we  use  the  simpler  0  —  1  weight¬ 
ing  described  in  Remark  4.4.  As  described  in  Chapter  4,  the  term  Aj(f)  —  A;(f)aj(f) 
changes  the  parameters  to  follow  the  negative  gradient  of  the  Least  Squares  cost 
function.  The  term  ]CjeA r^t)  (w  W  —  <b(t))  has  the  effect  of  propagating  every  robot’s 
parameters  around  the  network  to  be  used  by  every  other  robot,  and  ultimately  causes 
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Algorithm  1  Consensus-Based  Coverage 
Require:  Each  robot  knows  its  position  p,(f) 

Require:  Each  robot  can  communicate  with  its  Voronoi  neighbors 
Require:  Each  robot  can  compute  its  Voronoi  cell,  V, 

Require:  Each  robot  can  measure  <j>(pi)  with  its  sensors 
Initialize.  A^(0)  0,  A^(0)  0,  and  cq(0)  =  [uminj  *  -  ■  >  ^min] 

loop 
Update: 


A  i{t  +  1)  =  A  i(t)  + 

A  i(t  +  1)  =  A  i(t)  +  lC(pi(t))IC(pi(t))T 
aipie(t)  =  di(t)  +  7(Ai(t)  -  A  i(t)a<(t))  +C  (%(f)  -  di(t)) 

j€Afi(t) 


Project  d,pre(f)  to  ensure  parameters  remain  positive:  dj(t  +  1)  = 

max(aipre(t), 

&min) 

Compute  the  robot’s  Voronoi  region  Vj 

Discretize  Vi  into  grid  squares  with  area  A q  and  center  points  q  €  Vi 
Compute  the  centroid  estimate: 


CVi(t) 


Eggy#ifat)Aq 
Eg€K<^M)Aq  ’ 


where  <^(q,t)  =  1C(q)Tdi{t) 


Drive  to  the  estimated  centroid:  Pi(t  +  1)  =  Cy(i) 

end  loop 


all  robots’  parameter  vectors  to  approach  a  common  value.  Finally,  parameters  are 
maintained  above  a  predefined  minimum  positive  value  amin  €  R,  amm  >  0,  using 

di(t  +  1)  =  max(aipre(f),  amin),  (5.7) 

where  the  min(-,  ■)  operates  element-wise  on  the  vector  dipre(t).  Our  consensus-based 
coverage  algorithm  (as  executed  asynchronously  by  each  robot)  is  written  in  Algo¬ 
rithm  1. 

In  summary,  our  coverage  control  algorithm  integrates  the  sensor  measurements 
and  robot  trajectory  into  A*  G  Rm  and  A*  e  Rmxm,  respectively.  These  are  then 
used  to  tune  the  parameter  vector  oj(t),  which  is  also  combined  with  the  neighbors’ 
parameter  vectors.  The  parameter  vector  is  used  to  calculate  the  sensory  function 
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estimate  4>i(q,  t),  which  is  used  to  calculate  the  estimated  Voronoi  centroid  Cy; ,  which 
the  robot  then  moves  toward.  The  algorithm  is  a  discrete-time  interpretation  of  the 
control  law  from  [100],  which,  under  mild  assumptions,  was  proved  to  cause  robots 
to  converge  to  the  centroids  of  their  Voronoi  cells. 

By  implementing  the  control  algorithm  on  a  group  of  robots,  a  number  of  compli¬ 
cations  are  introduced  that  were  not  considered  in  [100],  as  described  in  Table  5.1.  The 
presence  of  noise  in  all  measurement  and  actuation  operations  is  a  significant  change 
from  the  noiseless  scenario  considered  in  [100].  Noise  on  the  position  measurements 
of  neighbors  in  particular  seemed  to  be  a  large  source  of  error  in  the  computation 
of  the  centroid  of  the  Voronoi  regions.  We  find  that  the  algorithm  performs  well 
despite  the  presence  of  these  real-world  complications.  The  robustness  of  the  algo¬ 
rithm  can  be  attributed  to  its  closed-loop  structure,  which  constantly  incorporates 
position  updates  and  new  sensor  measurements  to  naturally  correct  mistakes.  Also, 
the  consensus-learning  law  tends  to  smooth  the  effects  of  noise  on  the  sensory  func¬ 
tion  measurements.  This  is  because  the  parameter  vectors  are  iteratively  combined 
with  neighbors’  parameter  vectors,  so  inaccuracies  that  might  otherwise  accumulate 
due  to  measurement  errors  are  counteracted  by  measurement  errors  from  neighboring- 
robots. 


5.3  Results  and  Experimental  Snapshots 

The  algorithm  was  implemented  in  integer  arithmetic  on  a  network  of  16  SwarmBots 
[64]  (Figure  5-5(a)).  Each  SwarmBot,  used  an  on-board  IR  system  to  sense  relative 
neighbor  positions  (for  computing  its  Voronoi  cell)  and  to  communicate  its  parameter 
vector  to  its  neighbors.  The  robots  moved  in  a  square  environment  2.44m x 2.44m. 
Each  robot’s  global  position  was  measured  by  an  overhead  camera  and  sent  to  it  by 
radio.  Each  SwarmBot  used  a  40MHz  32-bit  ARM  Thumb  microprocessor,  which 
provided  enough  processing  power  to  execute  our  algorithm  in  real-time.  There  was 
no  centralized  or  off-line  processing. 

The  system  reliably  performed  numerous  experiments  and  demonstrations.  Here 
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Table  5.1:  Algorithm  1  vs.  Controller  from  Chapter  4 


Algorithm  1 

Controller  from  Chapter  4 

•  Discrete-time  difference  equa- 

•  Continuous-time  differential 

tions 

equations 

•  Nonholonomic  “unicycle”  robot 

•  Holonomic  “integrator”  robot 

dynamics  cause  position  errors 
and  turning  delays 

dynamics 

•  Asynchronous  execution  of  in- 

•  Synchronous  evolution  of  equa- 

structions 

tions 

•  Approximate  Voronoi  cells  con- 

•  Exact  Voronoi  cells  computed 

structed  from  noisy  measurements 

from  exact  positions  of  all  Voronoi 

of  neighbors  within  sensing  range 

neighbors 

•  Discretized  sums  over  the 

•  Exact  integrals  over  the  Voronoi 

Voronoi  cell 

cell 

•  Noisy  measurement  of  global  po- 

•  Exact  knowledge  of  global  posi- 

sition 

tion 

•  Noisy  actuators 

•  Noiseless  actuators 

•  Noisy  measurement  of  sensory 

•  Noiseless  measurement  of  sen- 

function 

sory  function 

•  Basis  function  approximation 

•  Basis  function  approximation 

cannot  reconstruct  exact  sensory 

can  reconstruct  sensory  function 

function 

exactly  with  ideal  parameter  vec¬ 
tor 

we  present  detailed  results  of  two  experiments.  In  the  first  experiment  in  Section 
5.3.1,  the  robots  were  given  a  noiseless  measurement  of  a  simulated  sensory  function 
This  allowed  us  to  compare  the  performance  of  the  algorithm  to  a  known 
ground  truth.  Since  the  function  <p(q)  is  known,  we  also  know  the  true  position  errors 
of  the  robots  (the  distances  to  their  true  centroids),  as  well  as  the  true  parameter 
errors.  In  the  second  experiment  in  Section  5.3.2,  the  robots  used  their  on-board  light 
sensors  to  sense  light  intensity  in  the  environment  as  a  sensory  function.  In  this  case 
we  have  no  ground  truth  value  for  (p(q ).  We  verify  that  the  algorithm  exhibits  the 
behavior  that  one  would  expect  given  the  scenario. 


5.3.1  Simulated  Sensory  Function 

The  simulated  sensory  function,  (/){q),  was  represented  by  two  Gaussians,  one  in 
the  lower  right  of  the  environment  and  one  in  the  upper  left.  The  set  of  basis 
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functions  was  chosen  to  be  9  Gaussians  arranged  in  a  grid  over  the  square  envi¬ 
ronment.  In  particular,  each  of  the  nine  components  of  IC(q)  was  implemented  as 
1/(27T£t2)  exp  {  —  ||g  —  nj\\2  /  (2a2)} ,  where  dj  —  .37 m.  The  2.44mx2.44m  square  was 
divided  into  an  even  3x3  grid  and  each  ji-j  was  chosen  so  that  one  of  the  9  Gaussians 
was  centered  at  the  middle  of  each  grid  square.  The  parameters  for  the  simulated 
sensory  function  were  chosen  as  a  =  [200  amin  •  •  •  amin  200] T,  with  amjn  =  1 
so  that  only  the  upper  left  and  lower  right  Gaussians  contributed  significantly  to  the 
value  of  4>{q). 

Figure  5-1  shows  the  positions  of  16  robots  over  the  course  of  an  experiment. 
The  algorithm  caused  the  robots  to  group  around  the  Gaussian  peaks.  The  robots 
had  no  prior  knowledge  of  the  number  or  location  of  the  peaks.  Figure  5-2(a)  shows 
the  distance  to  the  centroid,  averaged  over  all  the  robots.  The  distance  to  the  true 
centroid  decreased  over  time  to  a  steady  value.  The  distance  to  the  estimated  centroid 
decreased  to  a  value  close  to  the  pre-set  dead  zone  of  5cm.  The  significant  noise  in 
the  distance  to  the  estimated  centroid  comes  from  noise  in  the  IR  system  used  to 
measure  the  neighbor  positions.  This  caused  the  Voronoi  cells  to  change  rapidly, 
which  in  turn  caused  the  centroid  estimates  to  be  noisy.  Despite  this  noise,  the  true 
distance  to  the  centroid  decreased  steadily,  indicating  that  the  algorithm  is  robust 
to  these  significant  sources  of  error.  Figure  5-2 (b)  shows  that  the  normed  parameter 
error,  averaged  over  all  of  the  robots,  decreased  over  time.  Figure  5-2(c)  shows 
Xa=i  a>i{t)T  YljetfMiii)  -dj(t)),  representing  the  disagreement  among  the  parameter 
vectors  of  different  robots.  The  disagreement  started  at  zero  because  all  parameters 
were  initialized  with  the  same  value  of  arain.  The  disagreement  initially  grew,  then 
decreased  as  the  robots’  parameters  reached  a  consensus. 

5.3.2  Measured  Sensory  Function 

An  experiment  was  also  carried  out  using  light  intensity  over  the  environment  as  the 
sensory  function.  Two  incandescent  office  lights  were  placed  at  the  lower  left  corner 
of  the  environment,  and  the  robots  used  on-board  light  sensors  to  measure  the  light 
intensity.  The  same  3x3  grid  of  basis  functions  as  in  the  first  experiment  was  used.  In 
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(a)  Initial  Snapshot 


(b)  Middle  Snapshot 


(c)  Final  Snapshot 


Figure  5-1:  Results  for  the  algorithm  are  shown  in  video  snapshots  in  the  left  column 
(5-l(a),  5-l(b),  and  5-l(c)).  The  positions  collected  from  the  overhead  camera  for 
the  same  experiment  are  plotted  in  the  right  column  (5-l(d),  5-l(e),  and  5-l(f)).  The 
Gaussian  centers  of  <f)(q)  are  marked  by  red  x’s. 


this  experiment  there  was  no  ground  truth  against  which  to  compare  the  performance 
of  the  algorithm  since  we  did  not  know  the  “true”  light  intensity  function  over  in  the 
environment.  We  instead  show  that  the  algorithm  caused  the  network  to  do  what 
one  would  expect  given  the  qualitative  light  intensity  distribution. 

Figure  5-3  shows  snapshots  of  the  experiment  taken  from  the  overhead  camera. 
Notice  that  the  robots  collected  in  higher  density  around  the  light  sources  while 
still  covering  the  environment.  Figure  5-4(a)  shows  that  the  distance  to  the  robots’ 
estimated  centroids  decreased,  albeit  with  a  significant  amount  of  noise  due  to  un¬ 
certainty  in  the  neighbor  position  estimates,  as  in  the  previous  experiment.  Figure 
5-4(a)  also  shows  the  distance  to  the  estimated  centroid  filtered  so  that  the  decreas¬ 
ing  trend  becomes  more  evident.  Also,  Figure  5-5 (b)  shows  that  the  robots  learned 
a  function  with  a  large  weight  near  the  position  of  the  light  sources.  The  weights  on 
the  9  Gaussians  adjusted  to  find  the  best  fit  of  the  data.  Figure  5-4(b)  shows  that, 
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Time  (s) 

(a)  Mean  Position  Error 


Time  (s) 

(b)  Mean  Parameter  Error 


Figure  5-2:  The  distance  to  the  actual  centroid,  and  the  distance  to  the  estimated 
centroid,  averaged  over  all  the  robots  are  shown  in  5-2(a).  The  normed  parameter 
error  averaged  over  all  robots  is  shown  in  5-2(b).  The  plot  in  5-2(c)  shows  a  quantity 
representing  the  disagreement  of  parameters  among  robots. 


as  in  the  previous  experiment,  disagreement  between  robot  parameters  initially  grew, 
then  decreased  as  the  robots  tended  toward  consensus.  The  parameters  never  actu¬ 
ally  reach  consensus  because  of  noise  and  calibration  differences  among  the  different 
robots’  light  sensors. 


5.4  Synopsis 

In  this  chapter,  we  implemented  a  control  algorithm  for  multi-robot  coverage  on  a 
minimalist  robot  platform.  The  controller  was  adapted  to  the  hardware  platform 
available,  and  was  shown  to  perform  robustly  despite  the  presence  of  sensor  and 
actuator  noise,  and  other  real-world  complications.  We  presented  the  results  of  two 
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(a)  Initial  Snapshot  (b)  Middle  Snapshot  (c)  Final  Snapshot 


Figure  5-3:  Results  for  the  algorithm  are  shown  in  video  snapshots  in  the  left  column 
(5-3(a),  5-3(b),  and  5-3(c)).  The  positions  collected  from  the  overhead  camera  for 
the  same  experiment  are  plotted  in  the  right  column  (5-3(d),  5-3(e),  and  5-3(f)).  The 
robots  used  the  light  intensity  measured  with  on  board  light  sensors  as  the  sensory 
function. 


experiments  with  16  robots.  In  the  first  experiment,  the  robots  were  given  simulated 
sensory  function  measurements  so  that  we  could  compare  the  results  with  a  known 
ground  truth.  In  the  second  experiment,  the  robots  used  measurements  from  light 
sensors  as  a  sensory  function.  We  hope  these  results  represent  a  significant  step 
toward  the  use  of  multi-robot  coverage  control  algorithms  in  practical  monitoring 
and  surveillance  applications  in  the  future. 
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Figure  5-4:  The  distance  to  the  estimated  centroid,  averaged  over  all  the  robots  in 
the  network  is  shown  in  5-4(a).  The  plot  in  5-4(b)  shows  a  quantity  representing  the 
disagreement  of  parameters  among  robots. 


(a)  SwarmBot  (b)  Function  Approximation 


Figure  5-5:  The  iRobot  SwarmBot  platform  is  shown  in  5-5(a).  The  basis  function 
approximation  of  the  light  intensity  (smooth  surface)  over  the  area  for  one  robot  is 
shown  in  5-5(b)  superimposed  over  a  triangular  interpolation  of  the  light  intensity 
measurements  of  all  the  robots  (jagged  surface). 
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Chapter  6 


Coverage  with  Quad-Rotors 

6.1  Introduction 

In  this  chapter  we  apply  the  theory  from  Chapter  3  to  control  flying  robots  with 
downward  facing  cameras.  This  work  demonstrates  how  to  incorporate  a  realistic 
sensor  model  of  the  camera  to  obtain  an  appropriate  cost  function  H  to  derive  the 
gradient  based  controller  from  Equation  (2.7).  The  computation  of  the  gradient 
controller  in  this  case  is  more  difficult,  and  the  application  of  the  gradient  convergence 
and  stability  theorems  (Theorems  2.3  and  2.4)  require  the  verification  of  some  more 
intricate  technical  details. 

Multiple  collaborating  robots  with  cameras  are  useful  in  a  broad  range  of  applica¬ 
tions,  from  surveying  disaster  sites,  to  observing  the  health  of  coral  reefs.  However,  an 
immediate  and  difficult  question  arises  in  such  applications:  how  should  one  position 
the  robots  so  as  to  maintain  the  best  view  of  an  environment?  In  this  chapter  we  offer 
an  approach  motivated  by  an  information  content  principle:  minimum  information 
per  pixel.  Using  information  per  pixel  as  a  metric  allows  for  the  incorporation  of 
physical,  geometric,  and  optical  parameters  to  give  a  cost  function  that  represents 
how  well  a  group  of  cameras  covers  an  environment.  We  develop  the  approach  in  de¬ 
tail  for  the  particular  case  of  multiple  downward  facing  cameras  mounted  to  robots. 
The  cost  function  leads  to  a  gradient-based  distributed  controller  for  the  robots  to 
position  themselves  in  three  dimensions  so  as  to  best  observe  a  planar  environment 
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Figure  6-1:  This  snapshot  of  an  experiment  shows  three  flying  quad-rotor  robots 
moving  so  that  their  cameras  cover  the  environment  represented  by  the  white  polygon. 


over  which  they  hover.  We  present  simulation  results  in  a  Matlab  environment.  We 
also  present  experimental  results  with  three  AscTec  Hummingbird  quad-rotor  robots. 

Our  algorithm  can  be  used  in  support  of  a  higher-level  computer  vision  task,  such 
as  object  recognition  or  tracking.  We  address  the  problem  of  how  to  best  position  the 
robots  given  that  the  data  from  their  cameras  will  be  used  by  some  computer  vision 
algorithm.  Our  design  principle  can  be  readily  adapted  to  a  number  of  applications. 
For  example,  it  could  be  used  to  control  groups  of  autonomous  underwater  or  air 
vehicles  to  do  mosaicing  [79] ,  or  to  produce  photometric  stereo  from  multiple  camera 
views  [41],  for  inspection  of  underwater  or  land-based  archaeological  sites,  biological 
environments  such  as  coral  reefs  or  forests,  disaster  sites,  or  any  other  large  scale  en¬ 
vironment  of  interest.  Our  algorithm  could  also  be  used  by  autonomous  flying  robots 
to  do  surveillance  [21],  target  tracking  [12, 18,50],  or  to  aid  in  navigation  of  agents  on 
the  ground  [81]. 
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6.1.1  Related  Work 


One  recent  extension  described  in  [63],  Figure  14,  proposed  an  algorithm  for  the 
placement  of  hovering  sensors,  similar  to  our  scenario.  Our  method  in  this  chapter 
is  related  to  this  work  in  that  we  propose  a  cost  function  and  obtain  a  distributed 
controller  by  taking  its  gradient.  However,  the  cost  function  we  propose  is  different 
from  previous  ones  in  that  it  does  not  involve  a  Voronoi  partition.  To  the  contrary,  it 
relies  on  the  fields  of  view  of  multiple  cameras  to  overlap  with  one  another.  Another 
distinction  from  previous  works  is  that  the  agents  we  consider  move  in  a  space  that  is 
different  from  the  one  they  cover.  Previous  coverage  scenarios  have  considered  agents 
constrained  to  move  in  the  environment  that  they  cover,  which  leads  to  a  constraint 
that  the  environment  must  be  convex  (to  prevent  agents  from  trying  to  leave  the 
environment).  In  contrast,  we  consider  agents  moving  in  a  space  M3,  covering  an 
arbitrary  lower  dimensional  environment  Q  C  K2,  This  eliminates  the  need  for  Q 
to  be  convex.  Indeed,  it  need  not  even  be  connected.  It  must  only  be  Lebesgue 
measurable  (since  the  robots  will  calculate  integrals  over  it),  which  is  quite  a  broad 
specification. 

There  have  also  been  other  algorithms  for  camera  placement,  for  example  a  prob¬ 
abilistic  approach  for  general  sensor  deployment  based  on  the  Cramer-Rao  bound  was 
proposed  in  [42],  and  an  application  of  the  idea  for  cameras  was  given  in  [31].  We 
choose  to  focus  on  the  problem  of  positioning  downward  facing  cameras,  similarly 
to  [54],  as  opposed  to  arbitrarily  oriented  cameras.  Many  geometrical  aspects  of  the 
problem  are  significantly  simplified  in  this  setting,  yet  there  are  a  number  of  practical 
applications  that  stand  to  benefit  from  controlling  cameras  in  this  way,  as  previously 
described.  More  generally,  several  other  works  have  considered  cooperative  control 
with  flying  robots  and  UAV’s.  For  an  excellent  review  of  cooperative  UAV  control 
please  see  [85],  or  [11]  and  [83]  for  two  recent  examples. 

6.1.2  Contributions 

The  main  contributions  of  this  chapter  are: 
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1.  Applying  the  ideas  of  Chapter  3,  we  propose  the  minimum  information  per 
pixel  principle  to  formulate  a  cost  function  for  multiple  hovering  robots  with 
downward  facing  cameras.  We  use  the  cost  function  to  design  a  gradient  descent 
controller  to  deploy  multiple  robots  to  their  optimal  positions  in  a  distributed 
fashion. 

2.  We  implement  the  proposed  controller  on  three  quad-rotor  robots  and  test  its 
performance  in  experiments. 

The  proposed  robot  coordination  algorithm  is  fully  decentralized,  provably  stable, 
adaptive  to  a  changing  number  of  flying  agents  and  a  changing  environment,  and  will 
work  with  a  broad  class  of  environment  geometries,  including  convex,  non-convex, 
and  disconnected  spaces. 

6.2  Optimal  Camera  Placement 

We  motivate  our  approach  with  an  informal  justification  of  a  cost  function,  then 
develop  the  problem  formally  for  the  single  camera  case  followed  by  the  multi-camera 
case.  We  desire  to  cover  a  bounded  environment,  Q  C  R2,  with  a  number  of  cameras. 
We  assume  Q  is  planar,  without  topography,  to  avoid  the  complications  of  changing 
elevation  or  occlusions.  As  in  previous  chapters,  let  pt  G  V  represent  the  state  of 
camera  i,  where  the  state-space,  V,  will  be  characterized  later.  We  want  to  control  n 
cameras  in  a  distributed  fashion  such  that  their  placement  minimizes  the  aggregate 
information  per  camera  pixel  over  the  environment, 

min 

(PU-,Pn)eVn 

This  metric  makes  sense  because  the  pixel  is  the  fundamental  information  captur¬ 
ing  unit  of  the  camera.  Consider  the  patch  of  image  that  is  exposed  to  a  given 
pixel.  The  information  in  that  patch  is  reduced  by  the  camera  to  a  low-dimensional 
representation  (i.e.  mean  color  and  brightness  over  the  patch).  Therefore,  the  less 
information  content  the  image  patch  contains,  the  less  information  will  be  lost  in  its 
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low-dimensional  representation  by  the  pixel.  Furthermore,  we  want  to  minimize  the 
accumulated  information  loss  due  to  pixelation  over  the  whole  environment  Q,  hence 
the  integral.  In  the  next  two  sections  we  will  formalize  the  notion  of  information  per 
pixel. 

6.2.1  Single  Camera 

We  develop  the  cost  function  for  a  single  camera  before  generalizing  to  multiple 
cameras.  It  is  convenient  to  consider  the  information  per  pixel  as  the  product  of 
two  functions,  /  :?xQh(0,oo],  which  gives  the  area  in  the  environment  seen 
by  one  pixel  (the  “area  per  pixel”  function),  and  <fr  :  Q  i— >  (0,oo)  which  gives  the 
information  per  area  in  the  environment.  The  form  of  f{Pi,q )  will  be  derived  from 
the  optics  of  the  camera  and  geometry  of  the  environment.  The  function  <p(q)  is  a 
positive  weighting  of  importance  over  0  and  should  be  specified  beforehand  (it  can 
also  be  learned  from  sensor  data,  as  in  Chapter  4).  For  instance,  if  all  points  in  the 
environment  are  equally  important,  6(q)  should  be  constant  over  Q.  If  some  known 
area  in  Q  requires  more  resolution,  the  value  of  4>(q )  should  be  larger  in  that  area 
than  elsewhere  in  Q.  This  gives  the  cost  function 

min  [  f(p,q)d>(q)dq,  (6.1) 

p  Jq 

which  is  of  a  general  form  similar  to  the  one  seen  in  Chapter  3.  We  will  introduce 
significant  changes  to  this  basic  form  with  the  addition  of  multiple  cameras. 

The  state  of  the  camera,  p,  consists  of  all  parameters  associated  with  the  camera 
that  effect  the  area  per  pixel  function,  f(p,  q).  In  a  general  setting  one  might  consider 
the  camera’s  position  in  R3,  its  orientation  in  so( 3)  (the  three  rotational  angles), 
and  perhaps  a  lens  zooming  parameter  in  the  interval  (0,  oo),  thus  leading  to  an 
optimization  in  a  rather  complicated  state-space  (V  =  l3  x  so(3)  x  (0,  oo))  for  only 
one  camera.  For  this  reason,  we  consider  the  special  case  in  which  the  camera  is 
downward  facing  (hovering  over  Q).  Indeed,  this  case  is  of  particular  interest  in 
many  applications,  as  described  in  Section  6.1.  We  define  the  field  of  view,  B,  to  be 
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CCD  —  '  ~]+- — ■  Camera 


Figure  6-2:  The  camera  optics  and  the  geometry  of  the  environment  are  shown  in 
this  figure. 

the  intersection  of  the  cone  whose  vertex  is  the  focal  point  of  the  camera  lens  with 
the  subspace  that  contains  the  environment,  as  shown  in  Figure  6-2.  In  Section  6.3.1 
we  will  consider  a  camera  with  a  rectangular  field  of  view,  but  initially  consider  a 
circular  field  of  view,  so  the  rotational  orientation  of  the  downward  facing  camera 
is  irrelevant.  In  this  case  V  =  R3,  and  the  state-space  in  which  we  do  optimization 
is  considerably  simplified  from  that  of  the  unconstrained  camera.  Decompose  the 
camera  position  as  p  —  [cT,  z]T,  with  c  €  M2  the  center  point  of  the  field  of  view,  and 
zGR  the  height  of  the  camera  over  Q.  We  have 


(6.2) 


where  $  is  the  half-angle  of  view  of  the  camera. 

To  find  the  area  per  pixel  function,  f{p,q),  consider  the  geometry  in  Figure  6-2. 
Let  b  be  the  focal  length  of  the  lens.  Inside  B,  the  area/pixel  is  equal  to  the  inverse  of 


the  area  magnification  factor  (which  is  defined  from  classical  optics  to  be  b2/ (b  —  z)2) 


times  the  area  of  one  pixel  [40].  Define  a  to  be  the  area  of  one  pixel  divided  by  the 
square  of  the  focal  length  of  the  lens.  We  have, 


a(b  —  z)2  for  q  €  B 

/(P,9)  =  < 

oo  otherwise, 


(6.3) 
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Figure  6-3:  This  figure  shows  the  relevant  quantities  involved  in  characterizing  the 
intersecting  fields  of  view  of  two  cameras. 

Outside  of  the  field  of  view,  there  are  no  pixels,  therefore  the  area  per  pixel  is  infinite 
(we  will  avoid  dealing  with  infinite  quantities  in  the  multi-camera  case).  The  cost 
function  in  (6.1)  takes  on  an  infinite  value  if  any  area  (of  non-zero  measure)  of  Q  is 
outside  of  the  field  of  view.  Indeed,  we  know  there  exists  a  p  €  V  such  that  the  cost 
is  finite,  since  Q  is  bounded  (given  c  and  9 ,  there  exist  z  €  R  such  that  Q  C  B). 
Therefore,  we  can  write  the  equivalent  constrained  optimization  problem 

min pfQa(b  +  z)2</>(q)dq,  (6.4) 

subject  to  Q  C  B. 

One  can  see  in  this  simple  scenario  that  the  optimal  solution  is  for  p  to  be  such  that 
the  field  of  view  is  the  smallest  ball  that  contains  Q.  However,  with  multiple  cameras, 
the  problem  becomes  more  challenging. 

6.2.2  Multiple  Cameras 

To  find  optimal  positions  for  multiple  cameras,  we  have  to  determine  how  to  account 
for  the  area  of  overlap  of  the  images  of  the  cameras,  as  shown  in  Figure  6-3.  Intuitively, 
an  area  of  Q  that  is  being  observed  by  two  different  cameras  is  better  covered  than  if  it 
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were  being  observed  by  only  one  camera,  but  it  is  not  twice  as  well  covered.  Consider 
a  point  q  that  appears  in  the  image  of  n  different  cameras.  The  number  of  pixels 
per  area  at  that  point  is  the  sum  of  the  pixels  per  area  for  each  camera.  Therefore 
(assuming  the  cameras  are  identical,  so  they  use  the  same  function  f(pt,  q))  the  area 
per  pixel  at  that  point  is  given  by  the  inverse  of  the  sum  of  the  inverse  of  the  area 
per  pixel  for  each  camera,  or 


area 

pixel 


C^f{Pi,q)  *)  \ 

i= 1 


where  p,  is  the  position  of  the  ith  camera.  We  emphasize  that  it  is  the  pixels  per 
area  that  sum  because  of  the  multiple  cameras,  not  the  area  per  pixel  because,  in  the 
overlap  region,  multiple  pixels  are  observing  the  same  area.  Therefore  the  inverse  of 
the  sum  of  inverses  is  unavoidable.  Incidentally,  this  is  the  same  form  one  would  use 
to  combine  the  variances  of  multiple  noisy  measurements  when  doing  sensor  fusion. 


Finally,  we  introduce  a  prior  area  per  pixel,  w  €  (0,  oo).  The  interpretation  of 
the  prior  is  that  there  is  some  pre-existing  photograph  of  the  environment  (e.g.  an 
initial  reconnaissance  photograph),  from  which  we  can  get  a  base-line  area  per  pixel 
measurement.  This  is  compatible  with  the  rest  of  our  scenario,  since  we  will  assume 
that  the  robots  have  knowledge  of  the  geometry  of  the  environment  Q,  and  some 
notion  of  information  content  over  it,  which  could  also  be  derived  from  a  pre¬ 
existing  photograph.  This  pre-existing  information  can  be  arbitrarily  vague  (w  can 
be  arbitrarily  large)  but  it  must  exist.  The  prior  also  has  the  benefit  of  making  the 
cost  function  finite  for  all  robot  positions.  It  is  combined  with  the  camera  sensors  as 
if  it  were  another  camera  to  get 


area 

pixel 


Q 'ZfiPi’Q )  1+w  *) 

i=l 


Let  jVg  be  the  set  of  indices  of  cameras  for  q  is  in  the  field  of  view,  Afq  =  {i  |  q  e  B{). 
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We  can  now  write  the  area  per  pixel  function  as 


gq{f(pi,q)f-J(pn,q))  =  i^fipuq)  1  +  w  *)  1-  (6-5) 

iGA/g 

which  is  very  similar  to  the  mixing  function  ga,  with  a  =  — 1,  from  Chapter  3.  The 
only  difference  is  the  prior  w.  In  a  probabilisitic  setting,  as  in  Section  3.3.2,  this 
would  represent  the  variance  of  a  Gaussian  prior.  Forming  the  standard  cost  function 
as  in  Chapter  3,  gq  is  integrated  over  the  environment  to  give  the  cost  function 

n{p!,...,pn)=  [  gq(f(pi,q),-.-,f(jPn,q))<i>(q)dq-  (6-6) 

Jq 

We  will  often  refer  to  g  and  H  without  their  arguments.  Now  we  can  pose  the  multi¬ 
camera  optimization  problem, 


min  H.  (6.7) 

(pi,...,Pn)€'Pn 

The  cost  function  (6.6)  is  of  a  general  form  valid  for  any  area  per  pixel  function 
f(Pi,q),  and  for  any  camera  state  space  V  (including  cameras  that  can  can  swivel 
on  gimbels).  We  proceed  with  the  special  case  of  downward  facing  cameras,  where 
=  R3  and  f(pj,q)  is  from  (6.3)  for  the  remainder  of  the  chapter. 

6.3  Distributed  Control 

We  will  take  the  gradient  of  (6.6)  and  find  that  it  is  distributed  among  agents.  This 
will  lead  to  a  gradient-based  controller.  We  will  use  the  notation  gq4  to  mean 

=  (  X]  fiPpq)"1  +  u,_1)  (6-8) 

where  A fq\{i}  is  the  set  of  all  indices  in  J\fq,  except  for  i. 

Theorem  6.1  (Gradient  Component)  The  gradient  of  the  cost  function  H(pn  ■ .  .,pn) 
with  respect  to  a  robot’s  position  p,,,  using  the  area  per  pixel  function  in  (6.3)  is  given 
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by 


and 


m 

dci 


m 

dzi 


[  ( 9q  ~  9q,i)  | rl^(?)  <k, 

JQndBi 


Jc 


QndBi 


(. 9q  -  9q,i)<l>(<l)  tan  9dq 


f 


QnBi  a(b  -  Zi )3 


<?%)  dq. 


(6.9) 


(6.10) 


Proof  6.1  We  can  break  up  the  domain  of  integration  into  two  parts  as 


n  = 


/  9q4>{q)  dq  + 

J  QnBi  JQ\B.i 


gq<p(q)  dq. 


Only  the  integrand  in  the  first  integral  is  a  function  of  pi  since  the  condition  i  G  Mq 
is  true  if  and  only  if  q  €  Bi  (from  the  definition  of  Nq).  However  the  boundaries 
of  both  terms  are  functions  of  Pi,  and  will  therefore  appear  in  boundary  terms  in  the 
derivative.  Using  the  standard  rule  for  differentiating  an  integral,  with  the  symbol  d- 
to  mean  boundary  of  a  set,  we  have 


m 

dpi 


dqd{QnBj) 

dpi 


T 

'  nd(QnBi)dq 


dqd{Q\Bj) 

dpt 


T 

nd(Q\Bi )  dq, 


(6.11) 


where  qg.  is  a  point  on  the  boundary  of  a  set  expressed  as  a  function  of  p%,  and  ng. 
is  the  outward  pointing  normal  vector  of  the  boundary  of  the  set.  Decomposing  the 
boundary  further,  we  find  that  d(Q  n  B()  =  (dQ  n  Bf)  U(Qfl  dBf)  and  d{Q\Bi )  = 
(dQ\Bi)  U  (Q  fl  dBi).  But  points  on  dQ  do  not  change  as  a  function  of  pi,  therefore 
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we  have 


dQ(dQr\St)  =  Q  yqedQnBi 
dpi 

and  dq^B‘\  =  0  \Jqe8Q\Bi. 
dpi 

Furthermore,  everywhere  in  the  setQndBi  the  outward  facing  normal  of  d{Q\Bi)  is 
the  negative  of  the  outward  facing  normal  of  d{Q  Pi  Bf), 

na{Q\Bi)  =  ~n(0( Qns,)  Vq  €  Q  fl  dBi . 


Simplifying  (6.11)  leads  to 


on 

dci 


~  9q,i)<t>{<l) 


n(QndBi)  dq. 


(6.12) 


and 


m 

dzi 


dqjQndBj) 

dzi 


T 

ri:Q  iib,  dq  — 


a(b  -  Zi)3 


<P(q)  dq, 


(6.13) 


where  we  used  the  fact  that  dgjdc.i  =  [0  0]T,  and  a  straightforward  calculation  yields 
dgq/dzi  =  -2 gq/(a(b  -  z{ )3).  Now  we  solve  for  the  boundary  terms, 


dq(QrdBj) 

dci 


n^QndBi)  and 


dqjQndB,) ' 
dzi 


n(QndBi), 


which  generally  can  be  found  by  implicitly  differentiating  the  constraint  that  describes 
the  boundary.  Henceforth  we  will  drop  the  subscript  on  q,  but.  it  should  be  understood 
that  we  are  referring  to  points,  q,  constrained  to  lie  on  the  set  0  n  3B, .  A  point  q  on 
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the  boundary  set  Q  Pi  dBi  will  satisfy 


q  —  Cj  ||  =  z.i  tan  9, 


(6.14) 


and  the  outward  facing  normal  on  the  set  Q  D  Bi  is  given  by 


n(QrdBi )  = 


(q  ~  Ci) 

h-CiW 


Differentiate  (6.1 4)  implicitly  with  respect  to  Ci  to  get 


Ci)  =  0, 


where  1%  is  the  2  x  2  identity  matrix,  therefore 


dqT  {q-  a)  =  (q  -  g) 

dci  \\q-Ci\\  \\q  -  c* || 


which  gives  the  boundary  terms  for  (6.12).  Now  differentiate  (6.14)  implicitly  with 
respect  to  to  get 


dq  T  {q  ~  fr) 
dzi  \\q  —  c*|| 


=  tan  6 , 


which  gives  the  boundary  term  for  (6.13).  The  derivative  of  the  cost  function  TL  with 
respect  to  Pi  can  now  be  written  as  in  Theorem  6. 1 


Remark  6.1  (Intuition)  We  will  consider  a  controller  that  moves  a  robot  in  the 
opposite  direction  of  its  gradient  component.  In  which  case,  the  single  integral  for 
the  lateral  component  (6.9)  causes  the  robot  to  move  to  increase  the  amount  of  the 
environment  in  its  field  of  view,  while  also  moving  away  from  other  robots  j  whose 
field  of  view  overlaps  with  its  own.  The  vertical  component  (6.10)  has  two  integrals 
with  competing  tendencies.  The  first  integral  causes  the  robot  to  move  up  to  bring 
more  of  the  environment  into  its  field  of  view,  while  the  second  integral  causes  it  to 
move  down  to  get  a  better  look  at  the  environment  already  in  its  field  of  view. 
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Remark  6.2  (Requirements)  Both  the  lateral  (6.9)  and  vertical  (6.10)  compo¬ 
nents  can  he  computed  by  robot  i  with  knowledge  of  1)  its  own  position,  p,,  2)  the 
extent  of  the  environment  Q,  3)  the  information  per  area  function  4>(q),  and  4)  the 
positions  of  all  other  robots  whose  fields  of  view  intersect  with  its  own  (which  can  be 
found  by  communication  or  sensing). 

Remark  6.3  (Network  Requirements)  The  requirement  that  a  robot  can  com¬ 
municate  with  all  other  robots  whose  fields’  of  view  intersect  with  its  own  describes 
a  minimal  network  graph  for  our  controller  to  be  feasible.  In  particular,  we  require 
the  network  to  be  at  least  a  proximity  graph  in  which  all  agents  i  are  connected  to  all 
other  agents  j  G  A (,  where  Mi  —  {j  \  Q  Pi  B,  n  Bj  j}  ■  The  controller  can  be 

run  over  a  network  that  is  a  subgraph  of  the  required  proximity  graph,  in  which  case 
performance  will  degrade  gracefully  as  the  network  becomes  more  sparse. 

We  form  the  controller  using  the  gradient  according  to  the  standard  multi-robot 
controller  in  Equation  (2.7), 

Pi  =  •  (6-15) 

9pi 

We  can  prove  the  convergence  of  this  controller  to  locally  minimize  the  aggregate 
information  per  area. 

Theorem  6.2  (Convergence)  For  a  network  of  n  robots  with  the  closed-loop  dy¬ 
namics  in  (6.15), 

lim  =  0  Vi  €  {1, . . .  ,n}.  (6.16) 

t-too  op. 

Proof  6.2  The  proof  is  an  application  of  Theorem  2.3  from  Chapter  2.  The  closed- 
loop  dynamics  pi  —  - cTH/dpi  are  a  gradient  system.  We  must  only  show  that  all 
evolutions  of  the  system  are  bounded.  To  see  this,  consider  a  robot  at  pt  such,  that 
Q  fl  Bi  =  0.  Then  pt  —  0  for  all  time  (if  the  field  of  view  leaves  Q,  the  robot  stops 
for  all  time),  so  cfit)  is  bounded.  Given  Q  fl  Bi  0,  TL  is  radially  unbounded  (i.e. 
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coercive)  in  Zi,  therefore  ft  <  0  implies  that  Zi  is  bounded  for  all  time.  Therefore,  all 
conditions  of  Theorem  2.3  are  satisfied  and  the  trajectories  of  the  system  converge  to 
the  set  of  critical  points. 

To  be  more  precise,  there  may  exist  configurations  at  which  |^  =  0  Vi  that 
are  saddle  points  or  local  maxima  of  H.  However,  since  the  controller  is  a  gradient 
controller,  only  isolated  local  minima  of  7 i  are  stable  equilibria,  according  to  Theorem 
2.4. 

This  controller  can  be  implemented  in  a  discretized  setting  as  Algorithm  2.  In 
general,  the  integrals  in  the  controller  must  be  computed  using  a  discretized  approx¬ 
imation.  Let  Q  fl  dBi  and  Q  n  2%  be  the  discretized  sets  of  gird  points  representing 
the  sets  Q  fl  dBi  and  Q  D  Bt,  respectively.  Let  A q  be  the  length  of  an  arc  segment 
for  the  discretized  set  Q  n  dBi,  and  the  area  of  a  grid  square  for  the  discretized  set 
Qf)Bi.  A  simple  algorithm  that  approximates  (6.15)  is  then  given  in  Algorithm  2. 


Algorithm  2  Discretized  Controller 

Require:  Robot  i  knows  its  position  pi,  the  extent  environment  Q,  and  the  informa¬ 
tion  per  area  function  <j>(q). 

Require:  Robot  i  can  communicate  with  all  robots  j  whose  field  of  view  intersects 
with  its  own. 

loop 

Communicate  with  neighbors  to  get  pj 
Compute  and  move  to 

a(t  +  At)  =  Ci(t) 

~k  £fle<5naSi(&  “ 

Compute  and  move  to 


Zi(t  +  At)  =  Zi(t) 

~k  £ge<5nagi  (9q  ~  tan  9 A q 

+k  o(6-li)3^(7')A3 


end  loop 


Remark  6.4  (Time  Complexity)  To  determine  the  computational  complexity  of 
this  algorithm,  let  us  assume  that  there  are  m  points  in  both  sets  Q  fl  dBi  and  Q  DBi . 
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We  can  now  calculate  the  time  complexity  as 


m  n 

T(n,m)  <  E<°(  d+E°(i))  + 

3= 1  k=l 

m  n  n— 1 

Ew)  +  E°w + E0^)) € 

j=i  ^=1  fc=i 

W?,en  calculating  the  controller  for  all  robots  on  a  centralized  processor  ( as  was  done 
for  the  simulations  in  Section  6.5),  the  time  complexity  becomes  T(n,m)  €  0{n2m). 

Remark  6.5  (Adaptivity)  The  controller  is  adaptive  in  the  sense  that  it  will  stably 
reconfigure  if  any  number  of  robots  fail.  It  will  also  'work  with  nonconvex  environ¬ 
ments,  Q,  including  disconnected  ones.  In  the  case  of  a  disconnected  environment, 
the  robots  may  (or  may  not,  depending  on  the  specific  scenario)  split  into  a  number  of 
sub-groups  that  are  not  in  comm, unication  with  one  another.  The  controller  can  also 
track  changing  environments,  Q,  and  changing  information  per  area  functions,  <p{q), 
provided  these  quantities  change  slowly  enough.  This  is  not  addressed  by  the  proof, 
but  has  been  shown  to  be  the  case  in  simulation  studies. 

Remark  6.6  (Control  Gains  and  Robustness)  The  proportional  control  gain,  k, 
adjusts  the  aggressiveness  of  the  controller.  In  a  discretized  implementation  one 
should  set  this  gain  low  enough  to  provide  robustness  to  discretization  errors  and 
noise  in  the  system.  The  prior  area  per  pixel,  w,  adjusts  how  much  of  the  area  Q 
will  remain  uncovered  in  the  final  configuration.  It  should  be  chosen  to  be  as  large  as 
possible,  but  as  with  k,  should  be  small  enough  to  provide  robustness  to  discretization 
errors  and  noise  in  the  system. 

6.3.1  Rectangular  Field  of  View 

Until  this  point  we  have  assumed  that  the  camera’s  field  of  view,  Bt  is  a  circle, 
which  eliminates  a  rotational  degree  of  freedom.  Of  course,  actual  cameras  have  a 
rectangular  CCD  array,  and  therefore  a  rectangular  field  of  view.  In  this  section  we 
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revisit  the  gradient  component  in  Theorem  6.1  and  calculate  it  for  a  rectangular  field 
of  view  and  a  robot  with  a  rotational  degree  of  freedom. 

Let  the  state  space  of  pt  —  [cf  Zi  ,ipi]T  be  V  —  R3  x  §,  where  ipi  is  the  rotation 
angle.  Define  a  rotation  matrix 


R(A)  = 


cos  ijii  sin  Tpi 
—  sin  ipi  cos  i/jj, 


(6.17) 


where  R(ifi)q  rotates  a  vector  q  expressed  in  the  global  coordinate  frame,  to  a  co¬ 
ordinate  frame  aligned  with  the  axes  of  the  rectangular  field  of  view.  As  is  true  for 
all  rotation  matrices,  R(ijji)  is  orthogonal,  meaning  R(tpi)T  =  Using  this 

matrix,  define  the  field  of  view  of  robot  i  to  be 

Bi  =  {q\  | R(ipi){q  -cf)  |  <  Zi  tan  0}  ,  (6.18) 


where  0  =  [6i,9^ T  is  a  vector  with  two  angles  which  are  the  two  half- view  angles 
associated  with  two  perpendicular  edges  of  the  rectangle,  as  shown  in  Figure  6-4,  and 
the  <  symbol  applies  element-wise  (all  elements  in  the  vector  must  satisfy  <).  We 
have  to  break  up  the  boundary  of  the  rectangle  into  each  of  its  four  edges.  Let  4  be 
the  fcth  edge,  and  define  four  outward-facing  normal  vectors  n&,  one  associated  with 
each  edge,  where  nx  =  [1  0]T,  n2  =  [0  l]r,  n3  =  [- 1  0],  and  n4  =  [0  —  1].  The 

cost  function,  H(pi, . . .  ,pn),  is  the  same  as  for  the  circular  case,  as  is  the  area  per 
pixel  function  f{pi,q). 


Theorem  6.3  (Rectangular  Gradient)  The  gradient  of  the  cost  function  Tt{px, . . . ,  pn) 
with  respect  to  a  robot’s  position  pi  using  the  area  per  pixel  function  in  (6.3)  and  the 
rectangular  field  of  view  in  (6.18)  is  given  by 


dH 

dci 


-E  / 

k=l  ”  Q^k 


(9q  -  gq,i)R(4’i)Tnk(f>{q )  dq, 


(6.19) 
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Pi 


Figure  6-4:  The  geometry  of  a  camera  with  a  rectangular  field  of  view  is  shown  in 
this  figure. 


and 


=  Y"  f  {gq-  9q,i )  tan  eTnk<t>(q)  dq 

“  JQnik 


-L 


QrtBi  a(b  Zi) 


-Mq)  d(l > 


•(( q  -  Ci)TR{ipi  +  tt/2 )Tnk(j>(q)  dq. 


(6.20) 


(6.21) 


Proof  6.3  The  proof  is  the  same  as  that  of  Theorem  6.1  up  to  the  point  of  evaluating 
the  boundary  terms.  Equations  (6.12)  and  (6.13)  are  true.  Additionally  the  angular 
component  is  given  by 


&H 

dci 


dqjQndBi) 

dipi 


T 

n(QndB,) 


dq. 
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The  constraint  for  points  on  the  kth  leg  of  the  rectangular  boundary  is 


(q  ~  Ci)TR(ijji)Tnk  =  Zi  tan  0Tnk, 

from  (6.18).  Differentiate  this  constraint  implicitly  with  respect  to  Ci,  Zi,  and  tpi  and 
solve  for  the  boundary  terms  to  get 


and 


dq_ 

d'tpi 


T 

R(ipi)Tnk 


where  we  have  used  the  fact  that 


8a  T 

—  R{^i)Tnk  =  R{ffi)Tnk, 

UCi 

dq  T 

—  R(4>i)Tnk  =  ta,n0Tnk, 
~{q  ~  Ci)TR(i>i  +  7r/2 )Tnk, 


dRtyi) 

difi 


—  sin  ipi  cos  ifi 

—  cos  ifi  —  sin  tpi 


=  R('ifi+TT/2). 


Break  the  boundary  integrals  into  a  sum  of  four  integrals ,  one  integral  for  each  edge 
of  the  rectangle.  The  expression  in  Theorem  6.3  follows. 


Remark  6.7  (Intuition)  The  terms  in  the  gradient  have  interpretations  similar  to 
the  ones  for  the  circular  field  of  view.  The  lateral  component  (6.19)  has  one  integral 
which  tends  to  make  the  robots  move  away  from  neighbors  with  intersecting  fields  of 
view,  while  moving  to  put  its  entire  field  of  view  inside  of  the  environment  Q.  The 
vertical  component  (6.20)  comprises  two  integrals.  The  first  causes  the  robot  to  go  up 
to  take  in  a  larger  view,  while  the  second  causes  it  to  go  down  to  get  a  better  view 
of  what  it  already  sees.  The  angular  component  (6.21)  rotate  the  robot  to  get  more 
of  its  field  of  view  into  the  environment,  while  also  rotating  away  from  other  robots 
whose  field  of  view  intersects  its  own.  Computation  of  the  gradient  component  for  the 
rectangular  field  of  view  is  of  the  same  complexity  as  the  circular  case,  and  carries 
the  same  constraint  on  the  communication  topology. 
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6.4  Experiments 


We  implemented  Algorithm  2  on  a  group  of  three  AscTec  Hummingbird  flying  quad- 
rotor  robots.  Our  experiments  were  performed  at  CSAIL,  MIT  in  a  laboratory 
equipped  with  a  Vicon  motion  capture  system.  The  robots’  position  coordinates 
( x,y,z,yaw )  were  broadcast  wirelessly  at  50Hz  via  a  2.4  Ghz  xBee  module.  Each 
robot  was  equipped  with  a  custom  ARM  microprocessor  module  running  a  PID  po¬ 
sition  control  loop  at  33Hz.  Pitch  and  roll  were  fully  stabilized  by  the  commercial 
controller  described  in  [38].  A  schematic  of  the  experimental  setup  is  shown  in  Figure 
6-5. 

The  coverage  algorithm  was  implemented  on  the  same  onboard  ARM  modules, 
running  asynchronously  in  a  fully  distributed  fashion.  The  algorithm  calculated  way 
points  (Ci(t)  and  Zj(t)  from  Algorithm  2)  at  1Hz.  This  time-scale  separation  between 
the  coverage  algorithm  and  the  PID  controller  was  required  to  approximate  the  inte¬ 
grator  dynamics  assumed  in  (2.5).  The  camera  parameters  were  set  to  a  =  10“6  and 
b  =  10~2m  (which  are  typical  for  commercially  available  cameras),  the  field  of  view 
was  6  =  35deg,  the  information  per  area  was  a  constant  <p(q )  =  1,  the  prior  area  per 
pixel  was  w  —  10“6m2,  and  the  control  gain  was  k  =  HR5.  The  environment  to  be 
covered  was  a  skewed  rectangle,  3.7m  across  at  its  widest,  shown  in  white  in  Figure 
6-6. 

To  test  the  effectiveness  of  the  algorithm  and  its  robustness  to  robot  failures,  we 
conducted  experiments  as  follows:  1)  three  robots  moved  to  their  optimal  positions 
using  the  algorithm,  2)  one  robot  was  manually  removed  from  the  environment,  and 
the  remaining  two  were  left  to  reconfigure  automatically,  3)  a  second  robot  was  re¬ 
moved  from  the  environment  and  the  last  one  was  left  to  reconfigure  automatically. 
Figure  6-6  shows  photographs  of  a  typical  experiment  at  the  beginning  (Figure  6- 
6(a)),  after  the  first  stage  (Figure  6-6(b)),  after  the  second  stage  (Figure  6-6(c)),  and 
after  the  third  stage  (Figure  6-6(d)). 

The  initial  positions  are  shown  in  Figure  6-6(a),  the  final  positions  of  the  three 
robots  are  shown  in  Figure  6-6 (b),  the  final  positions  of  the  two  after  removing  one  is 


133 


Vicon  System 


Figure  6-5:  This  figure  shows  the  experimental  setup.  The  robots  positions  were 
captured  with  an  Vicon  motion  capture  system.  The  robots  used  their  position  in¬ 
formation  to  run  the  coverage  algorithm  in  a  distributed  fashion. 


shown  in  Figure  6-6  (c),  and  the  final  position  of  the  last  robot  after  removing  the  sec¬ 
ond  is  shown  in  Figure  6-6(d).  The  coverage  cost  of  the  robots  over  the  course  of  the 
whole  experiment,  averaged  over  19  experiments,  is  shown  in  Figure  6-7,  where  the  er¬ 
ror  bars  represent  one  standard  deviation.  Notice  that  when  one  robot  is  removed,  the 
cost  function  momentarily  increases,  then  decrease  as  the  remaining  robots  find  a  new 
optimal  configuration.  The  algorithm  proved  to  be  robust  to  the  significant,  highly 
nonlinear  unmodeled  aerodynamic  effects  of  the  robots,  and  to  individual  robot  fail¬ 
ures.  This  chapter  is  accompanied  by  a  video  showing  the  experiments  and  numerical 
simulations. 

We  repeated  the  above  experiment  a  total  of  20  times.  Of  these  19  were  successful, 
while  in  one  experiment  two  of  the  robots  collided  in  mid  air.  The  collision  was  caused 
by  an  unreliable  gyroscopic  sensor,  not  by  a  malfunction  of  the  coverage  algorithm. 
With  appropriate  control  gain  values,  collisions  are  avoided  by  the  algorithm’s  natural 
tendency  for  neighbors  to  repel  one  another. 
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(c)  Two  Config.  (d)  One  Config. 


Figure  6-6:  Frame  shots  from  an  experiment  with  three  AscTec  Hummingbird  quad- 
rotor  robots  are  shown.  After  launching  from  the  ground  (Figure  6-6(a)),  the  three 
robots  stabilize  in  an  optimal  configuration  (Figure  6-6(b)).  Then  one  robot  is  man¬ 
ually  removed  to  simulate  a  failure,  and  the  remaining  two  move  to  a  new  optimal 
position  (Figure  6-6(c)).  Finally  a  second  robot  is  removed  and  the  last  one  stabilizes 
at  an  optimal  position  (Figure  6-6(d)).  The  robots  move  so  that  their  fields  of  view 
(which  cannot  be  seen  in  the  snapshots)  cover  the  environment,  represented  by  the 
white  polygon. 
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Figure  6-7:  The  cost  function  during  the  three  stages  of  the  experiment,  averaged 
over  19  successful  experiments,  is  shown  in  Figure  6-7.  The  error  bars  denote  one 
standard  deviation.  The  experiments  demonstrate  the  performance  of  the  algorithm, 
and  its  ability  to  adapt  to  unforeseen  robot  failures. 

6.5  Simulations 

We  conducted  numerical  simulations  to  investigate  the  scalability  and  robustness  of 
the  algorithm.  Hovering  robots  with  integrator  dynamics  (2.5)  were  simulated  using 
Algorithm  2  on  a  centralized  processor.  The  values  of  a,  b,  6,  <f>,  w,  and  k  were  the 
same  as  in  the  experiments.  The  simulations  were  over  a  non-convex  environment, 
as  shown  in  Figure  6-8.  Communication  constraints  were  modeled  probabilistically. 
The  probability  of  robot  i  communicating  with  robot  j  was  calculated  as  a  linear 
function  of  the  distance  between  them  decreasing  from  1  at  a  distance  of  0,  to  0  at 
a  distance  of  R  =  1.5 m,  and  the  environment  width  was  roughly  3m.  Uncertainty 
in  the  robots’  velocity  was  modeled  as  white  Gaussian  noise  with  covariance  of  /3  x 
10_4m2/s2  (where  /3  is  the  3  x  3  identity  matrix).  Figure  6-8  shows  the  results  of 
a  typical  simulation  with  ten  robots.  The  robots  start  in  an  arbitrary  configuration 
and  spread  out  and  up  so  that  their  fields  of  view  cover  the  environment.  The 
decreasing  value  of  the  cost  function  7 i  is  shown  in  Figure  6-8  (d).  The  function 
does  not  decrease  smoothly  because  of  the  simulated  communication  failures,  velocity 
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(a)  Initial  Config. 


(b)  Middle  Config. 


(c)  Final  Config. 


Figure  6-8:  Results  of  a  simulation  with  ten  robots  covering  a  nonconvex  environment 
are  shown.  The  x ’s  mark  the  robot  positions  and  the  circles  represent  the  fields  of 
view  of  their  cameras.  Communication  failures  and  noise  on  the  robots’  velocities 
are  also  modeled  in  the  simulation.  The  initial,  middle,  and  final  configurations 
are  shown  in  6-8(a),  6-8(b),  and  6-8(c),  respectively.  The  decreasing  value  of  the 
aggregate  information  per  pixel  function,  H,  is  shown  in  6-8(d).  The  jaggedness 
of  the  curve  is  due  to  simulated  communication  failures,  noise,  and  the  discretized 
integral  approximation. 


noise,  and  discretized  integral  computation  in  Algorithm  2. 


6.6  Synopsis 

In  this  chapter  we  presented  an  application  of  the  ideas  from  Chapter  3  to  design  a 
distributed  control  algorithm  to  allow  hovering  robots  with  downward  facing  cameras 
to  cover  an  environment.  We  incorporate  a  realistic  sensor  model  for  the  camera  and, 
using  this  model,  formulate  the  cost  function  H  representing  the  aggregate  informa¬ 
tion  per  pixel  of  the  robots  over  the  environment.  The  controller  is  proven  to  locally 
minimize  the  cost  function,  and  can  be  used  in  nonconvex  and  disconnected  environ¬ 
ments.  We  implemented  the  algorithm  on  a  group  of  three  autonomous  quad-rotor 
robots,  and  experimentally  demonstrated  robustness  to  unforeseen  robot  failures.  We 
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also  investigated  scalability  and  robustness  to  network  failures  in  simulations  with  ten 
flying  robots. 
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Chapter  7 


Modeling  Animal  Herds 

7.1  Introduction 

In  this  chapter,  we  demonstrate  an  application  of  the  potential  field  controller  in 
Chapter  3  to  model  the  dynamics  of  groups  of  animals  and  robots.  The  setting  is 
one  of  system  identification:  we  are  presented  with  position  data  from  a  group  of 
agents,  and  we  want  to  learn  a  dynamical  model  with  a  potential  field  structure,  as 
in  Chapter  3,  to  represent  the  data.  The  method  we  present  is  general,  however  we 
will  demonstrate  the  technique  to  model  a  group  of  cows.  The  cows  are  equipped 
with  GPS  sensors  that  give  us  position  measurements  over  time.  We  use  system 
identification  techniques  to  learn  the  model,  that  is  to  tune  the  parameters  of  the 
model  to  fit  the  GPS  data. 

We  wish  to  model  groups  of  interacting  dynamic  agents,  such  as  flocks,  swarms, 
and  herds,  using  measured  data  from  those  agents.  For  example,  we  would  like  to 
use  the  trajectories  of  people  in  a  crowd  to  develop  dynamical  models  that  capture 
the  behaviors  of  the  crowd  as  a  whole.  This  is  a  prohibitively  complicated  problem 
in  general,  however,  we  provide  a  practical  solution  by  restricting  our  attention  to 
a  special  model  structure.  We  embrace  a  minimalist  approach  in  that  we  use  only 
position  measurements,  with  a  minimum  of  prior  environmental  information  incorpo¬ 
rated  into  the  model.  We  propose  a  difference  equation  model  that  is  decentralized 
and  nonlinear,  though  it  is  designed  to  be  linear-in-parameters.  The  Least  Squares 


139 


method  is  then  used  to  fit  model  parameters  to  position  data  from  a  group  of  agents. 
Such  a  model  may  then  be  used,  for  example,  to  predict  future  states  of  the  group, 
to  determine  individual  roles  of  agents  within  the  group  (e.g.  leaders  vs.  followers), 
or,  ultimately,  to  control  the  group. 

The  most  immediate  application  of  these  ideas  is  for  virtual  fencing  of  live¬ 
stock  [2,15,113],  in  which  physical  fences  are  replaced  with  sensor/actuator  devices 
mounted  on  the  animals.  The  animals’  positions  are  monitored,  and  if  they  stray 
beyond  a  virtual  fence  line,  the  animals  are  given  cues  to  return  to  the  desired  area. 
Our  modelling  techniques  will  be  useful  for  virtual  fencing  in  several  ways.  Firstly, 
our  models  lead  to  verified  behavioral  simulations  that  can  be  used  to  test  virtual 
fencing  algorithms  in  a  simulation  environment  before  they  are  implemented  in  a 
costly  and  time-consuming  field  test.  Secondly,  our  dynamical  models  can  be  used 
to  enhance  the  animal  control  algorithm  itself,  so  that  it  works  in  conjunction  with 
the  animals’  natural  tendencies.  Finally,  since  our  model  is  inherently  distributed 
and,  because  of  our  minimalist  approach,  requires  little  computational  resources,  we 
envision  that  the  model  can  run  online  over  the  same  network  of  animal-mounted 
sensor  devices  that  carry  out  the  virtual  fencing  algorithm.  The  distributed  model 
can  then  be  used  to  predict  where  the  group  is  headed  and  inform  the  controller  in 
real  time.  Simultaneously,  the  model  can  be  updated  to  fit  the  most  recent  position 
data  collected  from  the  animals.  This  simultaneous  model  learning  and  model-based 
control  is  in  the  spirit  of  adaptive  control. 

In  addition  to  livestock  management  applications,  there  are  many  other  uses  for 
learned  models  of  distributed  dynamical  systems.  In  the  case  of  people,  the  ability 
to  model  group  behavior  has  numerous  applications  in  surveillance,  urban  planning, 
and  crowd  control.  Also,  the  models  can  be  used  to  drive  groups  of  robots  to  mimic 
the  behavior  of  observed  groups.  This  may  be  useful  in  reproducing  collaborative 
behaviors  exhibited  in  natural  systems,  or  in  producing  decoy  robots  to  participate 
with  natural  or  engineered  groups,  and  even  to  influence  group  behavior  [39] . 
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7.1.1  Related  Work 


The  problem  of  learning  models  for  groups  of  interacting  dynamic  agents  lies  at  the 
intersection  of  two  fields  of  research:  modeling  of  distributed  dynamical  systems,  and 
system  identification.  A  vigorous  body  of  work  is  emerging  from  the  controls  and 
robotics  communities  focused  on  analyzing  models  of  flocks,  swarms,  and  similar  dis¬ 
tributed  dynamical  systems.  This  work,  however,  has  not  considered  using  learning 
techniques  to  generate  these  models  from  data.  Instead,  it  concentrates  on  the  dy¬ 
namical  properties  of  models,  such  as  stability  of  formations  [34,35,104,105,116], 
asymptotic  consensus  of  agent  positions  or  velocities  [27,46,71,111],  or  designing  lo¬ 
cal  controllers  from  global  specifications  [7,106].  These  considerations  are  elemental 
in  describing  more  complex  social  phenomena,  but  they  are  quite  different  from  the 
question  of  learning  models  from  data  which  we  address  in  this  chapter. 

Conversely,  the  rich  literature  on  learning  dynamical  systems  from  data,  often 
called  system  identification,  has  not  yet  addressed  models  of  distributed  dynamical 
systems,  such  as  the  ones  we  consider  in  this  thesis.  Some  related  problems  have  been 
considered,  however.  For  example,  in  [22]  a  system  identification  technique  is  used 
to  model  global  properties  of  a  swarm  of  robots  over  time  using  observed  data  from 
the  robots.  These  properties  include  collision  likelihoods  of  robots  and  transition 
probabilities  among  robot  behaviors.  There  also  has  been  considerable  activity  in 
learning  behavioral  models  of  individual  natural  agents.  In  [69]  and  [74],  system 
identification  is  carried  out  on  switching  linear  systems  to  learn  models  of  the  honey 
bee  waggle  dance  and  human  hand  motion,  respectively,  and  in  [28]  a  technique  is  used 
to  find  Motion  Description  Language  (MDL)  codes  from  observed  ants.  These  works, 
however,  do  not  consider  group  interactions,  but  investigate  the  action  of  individuals 
isolated  from  their  group  roles. 

It  is  our  intention  in  this  chapter  to  bridge  the  gap  between  these  two  research 
communities  by  applying  system  identification  techniques  to  distributed  model  struc¬ 
tures.  In  addition  to  this  cross-pollination  of  ideas,  we  also  contribute  a  new  technique 
for  modelling  general  vector  fields  (i.e.  non-gradient  vector  fields)  in  a  way  that  is 
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amenable  to  system  identification.  We  also  pursue  our  ideas  from  theory  through 
implementation  by  testing  our  method  with  data  from  natural  agents. 

For  this  purpose,  we  developed  a  hardware  platform  to  record  position  and  orien¬ 
tation  information  of  groups  of  free-ranging  cows.  The  hardware  platform  is  capable 
of  recording  GPS  position  information,  head  orientation,  and  is  able  to  provide  sound 
and  electrical  stimuli,  though  no  stimuli  were  administered  during  the  data  collection 
for  this  study.  We  demonstrate  our  model  learning  technique  by  fitting  it  to  GPS 
data  collected  from  a  group  of  three  and  a  group  of  ten  free  ranging  cows,  and  validate 
the  resulting  models  by  testing  the  whiteness  of  the  residual  error,  and  by  comparing 
global  statistics  of  simulations  verse  the  actual  data.  Previous  works  have  considered 
animal  mounted  sensor  network  devices,  such  as  the  ZebraNet  platform  [47],  and  the 
sensor/ actuator  devices  described  in  [2,15,84,113]  for  automatic  livestock  manage¬ 
ment.  Our  device  has  several  innovations  for  applying  animal  control  stimuli  and  for 
using  communication  between  devices  over  a  network,  however  we  do  not  describe 
these  innovations  in  detail  in  this  chapter.  In  the  context  of  this  chapter,  the  de¬ 
vices  were  used  as  a  means  to  collect  GPS  data  for  learning  and  validating  dynamical 
models. 

7.1.2  Contributions 

The  main  contribution  in  this  chapter  is  as  follows: 

1.  A  model  similar  to  the  potential  field  model  of  Chapter  3  is  proposed  for  mod¬ 
eling  the  herding  behavior  of  cows  and  other  groups  of  agents.  Least  squares 
systems  identification  is  then  used  to  tune  the  parameters  of  the  model  (similar 
to  the  on-line  learning  in  Chapter  4)  to  fit  GPS  data  from  actual  cows.  We 
also  analyze  the  predictive  ability  of  the  model  and  simulate  an  application  to 
deriving  controllers  for  robots  to  behave  like  a  herd  of  cows. 

The  remainder  of  this  chapter  is  organized  as  follows.  The  model  structure  is 
described  in  Section  7.2.  The  application  of  system  identification  to  identify  model 
parameters  is  described  in  Section  7.3,  along  with  a  review  of  basic  system  identifica- 
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tion  techniques  in  Section  7.3.1.  Our  data  collection  device  and  experimental  method 
are  described  in  Section  7.4.  Results  of  the  system  identification  technique  are  pre¬ 
sented  in  Section  7.5  with  GPS  tracking  data  from  a  group  of  three  cows  and  a  group 
of  ten  cows,  and  the  quality  of  the  learned  models  are  evaluated  in  Section  7.5.3. 
Finally,  in  Section  7.6  we  use  a  learned  model  to  control  a  group  of  mobile  robots 
to  behave  like  the  group  of  three  cows.  Simulation  results  of  the  group  of  robots  are 
presented.  A  synopsis  and  directions  for  future  work  are  given  in  Section  7.7. 


7.2  Model  Description 


We  consider  a  linear-in-parameters  model  structure  with  three  naturally  distinct  parts 
to  describe  the  motion  of  coupled  physical  agents  moving  over  a  plane  surface.  Firstly, 
each  agent  is  given  internal  dynamics  to  enforce  the  constrains  of  Newtons  laws. 
Secondly,  a  force1  is  applied  to  each  agent  from  its  interaction  with  each  of  the  other 
agents  in  the  group.  Thirdly,  a  force  is  applied  to  each  agent  as  a  function  of  its 
position  in  the  environment.  All  remaining  effects  are  modeled  as  a  white  noise 
process. 

Throughout  this  section,  we  refer  to  free  parameters  as  6 ,  and  features,  or  re¬ 
gressors,  are  denoted  by  <j>.  It  should  be  understood  that  the  parameters  6  are  left 
unknown  for  now.  In  Section  7.3  we  describe  how  position  data  is  used  to  tune  these 
parameters  to  fit  the  data.  A  schematic  showing  the  different  parts  of  the  model 
learning  process  are  shown  in  Figure  7-1. 


1  In  this  chapter  the  term  “force”  is  used  in  a  metaphoric  sense.  When  we  talk  of  a  “force”  we 
are  referring  to  the  intention  of  the  agent  to  accelerate  in  a  particular  way  using  it’s  own  motive 
mechanisms. 
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Figure  7-1:  A  schematic  of  the  method  of  system  identification  is  shown  in  this 
figure.  The  time  correlated  (not  independent  identically  distributed)  data  and  the 
model  structure  are  combined  in  an  optimization  procedure  to  get  model  parameters 
tuned  to  fit  the  data. 


7.2.1  Individual  Agent  Dynamics 


Given  a  group  of  m  agents,  the  proposed  model  structure  for  an  individual  agent 
i  €  {1, . . . ,  m}  can  be  written  in  state-space,  difference  equation  form  as 
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(7.1) 


Agent  i's  state  xj  =  [ej  ri[  uj  vJ]T  consists  of  its  East  position,  North  position, 
Eastern  component  of  velocity,  and  Northern  component  of  velocity  after  the  rth 
iteration,  and  its  position  is  given  by,  pj  =  \e\  nJ}T.  The  time  step  At  is  given  by 
tT+1— tT,  and  we  assume  it  is  constant  for  all  r.  The  term  a,  represents  damping,  a,  =  1 
for  zero  damping,  and  |ai|  <  1  for  stable  systems.  The  function  /y(p[,pj)  determines 
the  coupling  force  applied  by  agent  j  to  agent  i.  The  function  gi(pj)  represents 
the  force  applied  by  the  environment  to  the  agent  at  point  pj.  Finally,  wj  is  a  zero- 
mean,  stationary,  Gaussian  white  noise  process  uncorrelated  with  pj  V)  used  to  model 
the  unpredictable  decision-motive  processes  of  agent  i.  Nonholonomic  constraints 
which  are  often  present  in  mobile  agents,  such  as  people,  cattle,  or  automobiles,  are 
neglected  in  this  treatment,  though  they  could  be  incorporated  with  an  increase  in 
the  complexity  of  the  model  structure.  Note  that  the  force  terms  are  only  applied  to 
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Agent-to-Agent  Interaction  Force  Environment-to-Agent  Interaction  Force 


Figure  7-2:  The  magnitude  of  the  agent-to-agent  interaction  force  is  shown  on  the  left 
for  Oi  =  02  =  1.  On  the  right,  the  vector  field  representing  the  force  felt  by  an  agent 
at  each  point  on  the  plane  is  shown  for  an  example  agent  trajectory.  The  swirling 
patterns  evident  in  the  field  are  made  possible  by  a  novel  parameterization. 

affect  changes  in  velocity  in  accordance  with  Newton’s  second  law. 


7.2.2  Agent-to- Agent  Interaction  Force 


Dropping  the  r  superscripts  for  clarity,  the  form  of  the  agent  coupling  force  fij{Pi,Pj) 
is  given  by 


fij(Pi,Pj )  = 


02ij  \ 

\\Pi  ~  Pi  II  ) 


Tlij 

(m  —  1)  ’ 


(7.2) 


where Ry  =  (Pj-Pi)/\\Pj-Pi\\  is  the  unit  vector  along  the  line  from  Pi  to  p3  (henceforth, 
||  •  ||  will  denote  the  £2  norm).  The  factor  (m  —  1)  is  included  to  normalize  the  force 
exerted  by  one  neighbor  by  the  total  number  of  neighbors. 

This  is  the  simplest  of  a  family  of  force  laws  commonly  used  in  computational 
models  of  physical,  multi-body  systems.  The  important  feature  of  this  family  is 
that  an  agent  is  repulsed  from  its  neighbor  at  close  distances  and  attracted  to  its 
neighbor  at  far  distances.  To  see  this  property  clearly,  examine  the  magnitude  of 
force  exerted  by  one  neighbor  (m  —  1  =  1)  given  by  ||/y  ||  =  dUj  -  #2y/||Pj  -  Pill)  and 
shown  in  the  left  of  Figure  7-2.  Notice  that  with  9itj  >  0  and  62^  >  0  the  desired 
characteristic  is  achieved.  Indeed,  as  \\p3  —  pt\\  — >  0,  \\fij\\  —*  —00,  which  is  repulsive, 
while  || pj  —  Pi\\  — >  00,  \\fij\\  —>  0Uj  >  0,  which  is  attractive.  Other,  similar  force 
laws  can  be  created  to  produce  unbounded  attraction  as  \\pj  -  p{  ||  — »•  00  and  zero 
attraction  as  \\pj  -  p;||  — >  00.  We  chose  this  law  for  its  simplicity.  The  function  can 
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equivalently  be  expressed  as  the  gradient  of  a  potential  function. 


After  some  manipulation,  the  sum  of  fl}  over  all  neighbors  j  can  be  expressed  as 


where 
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and  where  the  indices  j  =  i  are  excluded  from  the  above  vectors  (since  we  do  not 
want  an  agent  to  feel  a  force  from  itself).  This  notation  will  be  useful  in  what  follows. 

The  agent-to-agent  force  law  with  the  dynamics  described  above  gives  a  so  called 
potential  field  based  flocking  model,  the  analytical  properties  of  which  have  been 
treated  extensively  in  the  controls  and  robotics  literature  [34,35,104,105,116].  The 
environment-to-agent  force  described  below  makes  our  model  rather  different  however, 
and  the  inclusion  of  the  noise  term  wj  makes  the  model  a  random  process,  which  is 
fundamentally  different  from  the  deterministic  systems  treated  in  those  works. 


7.2.3  Environment-to- Agent  Interaction  Force 

The  agent’s  preference  for  certain  paths  in  the  environment  is  modeled  as  a  nonlinear 
mapping  from  each  point  on  the  plane  to  a  force  vector  felt  by  the  agent.  To  this  end, 
two  networks  of  Gaussian  basis  functions  are  used,  one  for  each  of  two  perpendicular 
force  components. 
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In  particular,  the  function  g-i(pi)  can  be  written 


where 
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(7.4) 


(7.5) 


is  the  bi- variate  Gaussian  function,  and  k  €  {1 ,n).  Each  Gaussian  is  centered 
at  7 ik,  with  standard  deviation  cr^,  and  its  strength  is  represented  by  the  unknown 
parameters  6Uih,  for  the  Eastern  component,  and  0Vik  for  the  Northern  component. 
Gaussian  basis  functions  were  chosen  for  their  familiarity;  the  objective  being  to 
demonstrate  the  modeling  approach  with  a  minimum  of  complications.  A  number  of 
other  basis  function  types  could  be  used,  including  wavelets,  sigmoidal  functions,  or 
splines. 

It  is  important  to  note  that  a  vector-field  parameterized  in  this  way  is  not  a 
potential  gradient.  A  potential  gradient  field  cannot  admit  circulation  around  closed 
paths.2  We  introduce  a  non-gradient  parameterization  to  enable  circulation,  as  one 
can  imagine  agents  intending  to  traverse  closed  orbits  on  the  plane.  For  example, 
a  cow  may  have  a  routine  of  passing  between  a  water  source,  a  shaded  tree,  and  a 
grassy  patch  in  a  periodic  fashion. 

Figure  7-2,  on  the  right,  shows  a  plot  of  an  example  force-field  parameterized 
in  the  above  way.  The  arrows  show  the  forces  induced  by  the  field,  the  heavy  dots 
show  the  centers  of  the  Gaussian  functions,  7 and  the  curve  shows  the  path  of  an 
agent  over  the  vector  field.  The  swirling  patterns  evident  in  the  vector  field  would  be 
impossible  if  it  were  a  gradient  field. 

The  expression  in  (7.4)  can  be  put  into  a  different  form  to  match  that  of  (7.3).  In 

2proof:  Let  $(p)  be  a  potential  function  and  V(p)  =  — grad(^)  its  gradient  field.  Then  curl(F)  = 
curl(-grad(4'))  =  0,  thus  by  Green's  Theorem,  £  Vds  =  fA  curl(E)cL4  =  0,  where  s  is  any  closed 
curve  on  the  plane,  and  As  is  the  area  enclosed  by  s. 
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particular 


where 
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This  form  will  become  useful  in  what  follows. 


To  consider  the  computational  complexity  of  this  model,  consider  that  the  number 
of  agent-to-agent  interaction  terms  grows  as  the  square  of  the  number  of  agents,  0{m 2) 
and,  in  the  worst  case,  the  number  of  environment-to-agent  interaction  terms  grows 
as  the  product  of  the  time  duration  and  the  number  of  agents  0{Tm).  Therefore 
computing  successive  iterations  of  the  model,  not  to  mention  learning  the  model 
parameters,  will  become  intractable  as  the  size  of  the  group  approaches  hundreds  or 
thousands  of  members.  However,  we  can  alleviate  these  difficulties  in  a  natural  way. 
Firstly,  if  the  area  in  which  the  agents  move  is  bounded,  the  environment-to-agent 
interaction  terms  will  approach  0(m)  as  the  entire  area  is  explored  by  all  the  agents. 
Also,  for  large  groups  we  could  simply  add  a  finite  communication  radius  around  each 
agent,  so  that  neighbor  agents  outside  that  radius  do  not  produce  a  force.  This  would 
limit  the  complexity  of  agent-to-agent  parameters  to  0(m).  Thus  we  can  modify  the 
model  to  have  an  overall  complexity  linear  in  the  number  of  agents.  Also,  the  model 
is  naturally  decentralized,  thus  it  could  easily  be  implemented  on  a  network  of,  say, 
m  processors,  reducing  the  computation  time  to  a  constant  independent  of  the  size 
of  the  group.  In  this  chapter  we  do  not  consider  such  implementation  issues,  and 
the  groups  of  agents  we  deal  with  are  small  enough  that  computation  speed  is  not  a 
concern. 
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7.3  System  Identification  with  Least-Squares  Fit¬ 
ting 

We  will  provide  a  brief  introduction  to  the  field  of  system  identification.  Then  we 
will  use  a  Least  Squares  method  to  identify  optimal  parameters  for  our  model.  We 
will  also  discuss  recursive  methods  for  Least  Squares  fitting  that  can  be  used  to  tune 
parameters  for  our  model  on-line  as  data  is  collected. 

7.3.1  Method  Overview 

In  this  section  we  employ  the  tools  of  system  identification  [59] ,  the  basics  of  which 
are  briefly  reviewed  here  as  they  may  be  unfamiliar  to  the  reader.  If  a  stochastic 
dynamical  system  is  such  that  its  state  at  the  next  time  step  is  determined  by  its 
state  at  the  current  time  step  and  the  inputs  at  the  current  time  step,  a  state  space 
model  of  its  dynamics  can  be  formed  as  a  difference  equation, 

xT+1  =  F(xt,ut,wt,t),  (7.7) 

where  x  is  the  state,  u  is  the  input,  w  is  a  zero  mean,  stationary,  Gaussian  white  noise 
process  and  r  is  the  discrete  time  index.  Furthermore,  we  may  formulate  a  model 
structure  in  which  several  of  the  parameters,  6,  of  the  model  are  unknown.  If  these 
parameters  are  time-invariant  and  occur  linearly  in  the  function  F,  and  if  the  noise 
is  additive,  we  can  write  the  model  as 

xT+1  =  4>(xT ,  uT ,  t)6  +  wT ,  (7.8) 

where  <j>  is  a  row  vector  of  functions  of  the  state,  input,  and  time  (these  are  called 
statistics,  regressors,  or  features  depending  upon  the  research  field  in  which  they  are 
used)  and  6  is  a  column  vector  of  the  unknown  parameters.  Suppose  we  have  some 
arbitrary  value  of  the  parameters  of  the  system,  6.  Then  we  can  interpret  (7.8)  as 
a  means  of  predicting  the  expected  output  at  the  next  time  step  given  the  state  xT , 
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inputs  uT,  and  time  r  measured  at  the  current  time, 


xr+1  =  E[xr+1  |  xr,uT,r,  6]  —  <P(xt,ut,t)6,  (7.9) 

where  the  ■  denotes  a  predicted  value  ( wT  drops  out  in  the  expectation  because  it  is 
zero  mean).  Notice  that  the  predicted  output  is  a  function  of  the  parameter  values, 
xr+1(6).  If  we  then  compare  the  predicted  value,  xT+1,  with  the  actual  value,  xr+1, 
we  have  an  error  that  gives  an  indication  of  how  different  our  model  is  from  the 
actual  system.  We  form  a  cost  function  using  this  error.  One  common  cost  function 
is  constructed  from  summing  over  all  the  squared  errors  that  we  have  collected  from 
measuring  the  output  of  the  actual  system  and  comparing  it  to  the  predicted  output, 
J  —  £r  ||£T($)  —  xT ||2.  We  can  then  use  an  analytical  optimization  method,  the  Least 
Square  method,  to  find  the  parameters  6  =  6*  that  minimize  the  cost  function  J(6). 
This  can  be  interpreted  as  “fitting”  the  model  parameters  to  the  data,  and  it  results 
in  a  model  that  we  would  expect  to  give  the  best  prediction  of  outputs  given  the 
inputs.  This  process  is  described  graphically  in  Figure  7-1. 

System  identification  shares  many  similarities  with  machine  learning,  however, 
since  it  deals  with  dynamical  systems,  the  training  data  is  time  correlated  and  is 
presented  in  a  specific  order — it  is  not  Independent  Identically  Distributed  (IID).  This 
is  a  crucial  difference  between  system  identification  and  most  other  computational 
learning  problems.  Because  of  this  fact,  much  of  the  machine  learning  intuition  does 
not  apply  to  system  identification,  especially  with  regard  to  model  validation,  as 
described  in  more  detail  in  Section  7.5.3. 

7.3.2  Manipulating  the  Linear  Model 

The  model  structure  discussed  in  Section  7.2  has  the  convenient  property  that  it 
is  linear  in  its  unknown  parameters.  For  this  reason,  it  can  be  manipulated  into  a 
form  so  that  its  parameters  can  be  fitted  using  the  system  identification  technique 
described  above.  In  keeping  with  our  minimalist  approach,  we  assume  that  only 
position  measurements,  pj,  r  =  1, ...,  N,  are  available  to  perform  the  fitting.  We  can 
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eliminate  u%  and  v%  from  the  dynamics  in  (7.1)  to  provide  a  second  order  equation  in 
the  position  only.  Notice  that  from  (7.1)  we  can  write 


p[+1=p[  +  Af 


u 


(7.10) 


and 


r+1 

i 

T+l 


+  'Y  fij+9i+wi- 
1=1 jW 


(7.11) 


We  can  solve  (7.10)  for  [u[  vj]1'  and  substitute  into  the  right  hand  side  of  (7.11).  We 
then  substitute  the  result  back  into  the  right  hand  side  of  (7.10),  shifting  time  indices 
appropriately,  to  obtain  the  desired  expression 


r-,r+2  —  «T+! 


p'i  '■=#"*  +  (p[+1  -  Pl)ai  +  Y  fij+9i  +  WI )  • 


(7.12) 


We  can  use  the  above  expression  to  formulate  a  one-step-ahead  predictor  in  the  form  of 
(7.9).  First,  define  the  combined  regressor  vectors  <j>l{  =  [  (e[+1  —  eJ)/At  (pTfUi  <pTgu,  ], 
and 

(f>l.  =  [  (n[+1  —  n[)/At  < f>Tfv  (f)Tgv.  ],  and  a  combined  parameter  vector 
6,  =  [  a,  0j  6g.  ]r-  By  taking  the  expectation  conditioned  on  the  positions,  sub¬ 
stituting  (7.3)  and  (7.6)  for  Yl&ifij  an(l  9i ,  respectively,  then  making  use  of  the 
combined  regressor  and  parameter  vectors  we  get 


p[+2=Pl+1  +  At 


6i, 


(7.13) 


where  p[+2  is  the  expected  value  of  p*  after  r  +  2  time  steps,  given  positions  up  to 
r  +  1,  and  w]  drops  out  in  the  conditional  expectation. 
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7.3.3  Batch  Method 


The  so  called  Least  Squares  Batch  Method  method  is  now  implemented  to  find  the  op¬ 
timal  model  parameters.  Specifically,  we  wish  to  find  the  parameters,  0$,  to  minimize 
the  mean  squared  prediction  error  over  all  available  time  steps.  The  mean  squared 
prediction  error  can  be  written  J*  =  1/(N  —  2)  ^r=i 2(p[+2  ~  Pi+2)T{Pi+2  ~  Pi+2)- 
Substituting  into  with  (7.13)  and  (7.10)  yields 

At 2 

4  =  W^2{Yi  ~  (7-14) 
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and  uj  and  vj  are  obtained  from  (7.10).  The  Least  Squares  problem  is  then  formulated 
as  6*  =  argmin^  Jj(0j).  Following  the  typical  procedure  for  solving  the  Least  Squares 
problem  we  find  that 

o*  =  (7.16) 


The  right  hand  side  of  (7.16)  consists  entirely  of  measured  data  while  the  left  hand 
side  is  the  vector  which  represents  the  optimal  parameters  of  the  model.  We  assume 
that  the  data  are  rich  enough  that  the  matrix  inversion  in  (7.16)  is  possible.  The 
deep  implications  of  this  invertibility  are  discussed  in  [59].  The  myriad  merits  and 
deficiencies  of  Least  Squares  fitting  compared  with  other  learning  methods  will  not 
be  discussed  in  this  chapter. 

The  white  noise  signal  wj  can  now  be  estimated  using  the  resulting  residual  error 
in  the  fitting  process,  so  that 
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where  wj  is  our  estimate  of  vS[.  If  the  “true”  system  dynamics  are  represented  by  the 
fitted  model,  we  expect  to  find  that  wj  is  zero-mean,  stationary,  Gaussian  white  noise, 
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as  this  would  confirm  our  initial  assumption  on  the  properties  of  wj.  Specifically,  for 
perfect  fitting,  E [wi(t)wf  (t+r)\  =  S(t)Qu  where  6(t)  is  the  Kronecker  delta  function. 
Therefore,  the  “whiteness”  of  w,  can  be  used  as  an  indicator  of  the  goodness  of  fit 
that  has  been  achieved.  We  use  this  fact  in  Section  7.5.3  to  validate  our  learned 
models.  For  simulation  purposes,  as  in  Section  7.6,  we  would  assume  wj  is  a  white 
noise  process  with  covariance  Qj  equal  to  the  empirical  covariance  of  wj. 

In  such  a  way  we  learn  a  cove  model  for  each  cow  in  a  herd  using  measured  tracking 
data.  The  optimal  parameters  are  found  and  the  characteristics  of  the  random  vector 
wj  are  determined  for  each  cow  i  =  1, ...  ,m  to  yield  parameters  for  the  entire  herd. 
To  make  the  entire  process  more  clear,  we  have  codified  it  as  Algorithm  3. 

Algorithm  3  Batch  Identification  of  Group  Dynamics _ 

for  All  agents  in  the  group  do 
Apply  the  measured  data  to  (7.16) 

Use  9*  in  (7.13) 

This  defines  the  model  for  agent  i 

end  for 


7.3.4  Recursive  Method 


Algorithm  4  Recursive  Identification  of  Group  Dynamics 
for  All  agents  in  the  group  do 

Initialize  parameters  9,  and  P,  to  an  arbitrary  value 
Use  p  to  calculate  Kt 

end  for 
loop 

for  Each  agent  in  the  group  do 

Apply  one  position  to  (7.18)  and  (7.20),  using  A; 

Use  resulting  P,  to  calculate  A,;  for  the  next  iteration 
Use  6*  in  (7,13) 

This  defines  the  model  for  agent  i  for  one  time  step 

end  for 
end  loop 


The  Least  Squares  method  can  also  be  formulated  recursively,  so  that  each  new 
available  measurement  becomes  integrated  into  the  parameter  estimates,  tuning  them 
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as  time  progresses.  This  method  would  be  particularly  useful  for  the  parameter 
identification  step  in  an  adaptive  control  loop. 


First,  let  </>[ 


<K- 


,  and  yj  = 


iT 


% 


We  wish  to  tune  parame¬ 


ters  dynamically  according  to 


&l  =  er1  +  Ki(VT-^er1),  (7.18) 

where 

k;  =  prl4>f[Kh  +  (7.19) 

and 

pi  =  pr1  -  {pr^fiKh + tipr^fr^ipr1)/ \  (7.20) 

where  KJ  is  the  parameter  gain,  P[  is  the  parameter  covariance  matrix,  A4  is  a 
forgetting  factor  (0  <  A*  <  1),  and  I2  is  the  2x2  identity  matrix.  This  standard 
algorithm  is  stated  here  without  derivation.  The  interested  reader  can  find  a  thorough 
discussion  in  [59].  The  algorithm  for  this  method  is  given  in  Algorithm  4. 


Note  that  the  kinds  of  systems  under  consideration  are  likely  to  have  time  varying 
parameters.  For  instance  cows  are  likely  to  change  their  behavior  throughout  the 
day  in  accordance  with  sunlight,  temperature,  their  hunger  and  thirst,  etc.  For  this 
reason,  we  would  expect  the  parameter  following  properties  of  the  recursive  algorithm 
with  a  forgetting  factor  to  be  advantageous.  The  recursive  Least  Squares  algorithm 
can  be  used  to  learn  the  model  while  it  is  simultaneously  being  used  for  prediction  in  a 
control  algorithm.  This  would  result  in  an  adaptive  control  algorithm  for  distributed 
groups.  The  results  presented  in  the  following  sections  use  the  Batch  method.  We 
save  a  detailed  study  of  on-line  and  distributed  learning  algorithms  for  future  work. 
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Figure  7-3:  The  sensor  box  is  shown  here  with  lid  closed  (left)  and  lid  open  (right). 
The  box  is  roughly  21.5cmx  12.0cmx5.5cm  and  weighs  approximately  1kg.  It  is 
equipped  with  a  GPS  receiver,  wireless  networking  features,  and  a  suite  of  sensing  and 
actuation  capabilities.  The  Lithium-Ion  batteries  and  solar  panel  allow  for  indefinite 
operation  under  normal  conditions.  It  can  also  modularly  accommodate  expansion 
boards  for  various  other  applications. 

7.4  Data  Collection  Experiments 

7.4.1  Animal  Monitoring  Hardware 

We  have  developed  a  small  light-weight  box  (see  Figure  7-3)  for  data  collection  and 
animal  control  for  use  during  our  field  experiments.  The  box  contains  electronics 
for  recording  the  GPS  location  of  the  animal  as  well  as  other  sensor  data  which 
we  do  not  use  in  this  work  (a  3-axis  accelerometer,  a  3-axis  magnetometer,  and  a 
temperature  sensor).  The  box  also  contains  electronics  for  networking  with  other 
boxes,  and  for  applying  sound  and  electrical  stimuli  to  the  animal,  though  the  stimuli 
were  not  applied  during  the  data  collection  experiments  described  here.  Building  on 
the  pioneering  work  of  [15, 113]  on  animal  monitoring  hardware,  we  improved  the 
performance  of  the  device  by  mounting  it  on  top  of  the  animal’s  head,  as  shown  in 
Figure  7-4,  instead  of  packaging  it  as  a  collar.  We  found  the  head  mounted  device 
improved  several  aspects  of  the  device’s  performance  compared  to  the  previous  collar 
mounting:  (1)  the  GPS  satellites  were  more  likely  to  be  visible  from  the  top  of  the 
head,  (2)  solar  panels  on  the  box  were  more  likely  to  receive  direct  sun  exposure,  (3) 
networking  radio  communication  was  less  obstructed  by  the  animal’s  body,  (4)  the 
animal  was  less  able  to  deliberately  rotate  the  box,  and  (5)  the  box  was  prevented 
from  being  dipped  in  water  or  mud  and  was  generally  better  protected. 
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Figure  7-4:  The  sensor  box  is  mounted  to  the  head  of  the  cow  with  a  custom  fitted 
apparatus  made  of  fabric  and  plastic.  The  apparatus  is  designed  to  use  the  cow’s  ears 
to  keep  the  box  in  an  upright  position,  as  shown  in  this  figure. 

Our  sensor  box  is  approximately  21.5cmxl2.0cmx5.5cm  and  weighs  approxi¬ 
mately  1kg.  The  processor  is  a  32bit  ARM7TDMI  cpu  (NXP  model  LPC2148)  with 
512kB  program  memory,  40kB  RAM,  USB,  and  a  10  bit  A/D  converter.  The  de¬ 
vice  also  has  256kB  FRAM  (external  non-volatile  memory  with  no  rewrite  limit)  and 
a  removable  SD  card  with  2GB  storage  capacity.  Data  can  be  easily  and  quickly 
downloaded  to  a  computer  by  physically  transferring  the  SD  card,  or  by  downloading 
remotely  via  the  radios.  There  are  2  hardware  serials  which  are  multiplexed  for  a 
total  of  5.  The  sensors  in  the  box  include  a  GPS  engine,  3-axis  accelerometer,  3-axis 
magnetic  compass,  and  an  ambient  air  temperature  sensor.  There  are  many  general 
purpose  analogue  and  digital  I/O  lines,  so  additional  sensors  can  be  included. 

The  communication  system  consists  of  two  radios.  Firstly,  a  900MHz  radio  (Ae- 
rocomm  AC4790)  with  1  watt  transmit  power  is  used  for  long  range,  low  band  width 
communication.  This  radio  has  a  claimed  32km  range  and  a  claimed  57600b/s  trans¬ 
fer  rate.  However,  we  observed  a  maximum  of  only  2km  range  and  a  data  transfer  rate 
of  only  lOOOb/s.  This  is  particularly  odd  as  the  flat,  remote  environment  in  which 
the  radios  were  tested  should  have  been  ideal  for  radio  transmission.  The  cause  for 
the  poor  performance  of  this  radio  is  still  unknown.  Secondly,  the  box  uses  a  Blue¬ 
tooth  radio  with  100m  range  and  lOOkb/s  data  rate  for  short  range,  high  band  width 
communication. 

Power  is  provided  by  a  bank  of  8  Lithium-Ion  batteries  with  a  total  capacity  of 
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Figure  7-5:  The  GPS  positions  of  the  cows  are  shown  superimposed  on  satellite  images 
of  the  paddock  in  which  the  data  were  collected.  The  left  image  shows  data  collected 
from  three  cows  in  the  first  trial  between  February  2-5,  2007.  The  right  image  shows 
the  data  collected  from  ten  cows  in  the  second  trial  between  July  9-11,  2007. 


16  watt-hours.  The  batteries  are  continuously  recharged  by  a  solar  panel  mounted  on 
the  top  of  the  box  allowing  the  box  to  run  indefinitely  under  normal  conditions.  The 
batteries  have  enough  capacity  for  several  days  of  operation  without  the  solar  panels. 

Finally,  we  have  a  two-tier  animal  control  system  consisting  of  a  set  of  speakers 
for  applying  arbitrary,  differential  sound  stimuli  and  a  set  of  electrodes  that  enable 
the  application  of  differential  electrical  stimuli.  The  animal  control  system  was  not 
used  during  the  collection  of  the  data  described  in  this  chapter. 

The  box’s  operating  system  is  a  custom  designed  collaborative  multitasking  ar¬ 
chitecture.  Processes  run  as  scheduled  events  which  can  be  scheduled  to  run  at  mil¬ 
lisecond  intervals  with  no  preemption  or  real-time  constraints.  The  software  supports 
arbitrary  network  topologies  for  communication.  Users  interact  with  the  system  via 
a  serial  console  or  a  Java  user  interface.  These  can  be  accessed  directly  through  the 
serial  port  or  remotely  over  either  of  the  radios.  This  allows  remote  reconfiguration 
of  the  monitoring  devices  in  the  field.  The  operating  system  can  be  completely  re¬ 
programmed  using  an  attached  serial  cable,  remotely  over  the  radio,  or  by  placing  a 
file  on  the  SD  card. 
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7.4.2  Experimental  Methodology 


Data  were  collected  during  two  trials,  the  first  taking  place  from  February  2-5,  2007 
and  the  second  from  July  9-11,  2007,  during  which  time  three  head  and  ten  head  of 
cows  were  monitored,  respectively,  using  the  sensor  boxes  described  above.  During 
both  trials  cows  were  allowed  access  to  a  466ha,  or  4.66  km2,  paddock  (named  10B) 
located  on  the  US  Department  of  Agriculture- Agricultural  Research  Service’s  (USDA- 
ARS)  Jornada  Experimental  Range  (JER)  in  Southern  New  Mexico  (32°  37’  N,  106° 
45’W)  which  is  approximately  37km  Northeast  of  the  city  of  Las  Cruces  at  an  elevation 
of  approximately  1260m  above  sea  level.  The  climate  of  this  arid  area  has  ambient 
air  temperatures  that  range  from  a  high  of  36°  C  in  June  to  below  13°  C  in  January 
with  52%  of  the  mean  annual  precipitation  (230mm)  falling  as  rain  between  July  and 
September  [73,110].  Grasses  (39%  to  46%)  and  forbs  (36%  to  49%)  comprise  the 
predominant  vegetation  while  woody  shrubs  compose  14%  to  19%  of  the  remaining 
standing  crop  [3,45]  that  grows  in  a  mosaic  pattern  across  this  relatively  flat  landscape 
composed  of  three  major  landforms  [65] . 

In  the  first  trial,  three  free-ranging  mature  beef  cattle  of  Hereford  and  Hereford  x 
Brangus  genetics,  labeled  Cow  1-Cow  3,  were  fitted  with  the  sensor  boxes  described 
above.  Data  were  collected  over  four  days  from  February  2-5,  2007  at  a  data  collection 
rate  of  1Hz.  In  the  second  trial,  ten  free-ranging  mature  beef  cattle  of  similar  genetics, 
labeled  Cow  1-Cow  10  were  fitted  with  the  sensor  boxes.  Data  were  collected  at  1Hz 
over  three  days  from  July  9-11,  2007.  The  cows  1,  2  and  3  correspond  to  the  same 
three  cows  in  the  first  and  second  trials.  The  paddock  for  the  experiments  was  fenced 
with  the  geometry  shown  in  Figure  7-5.  During  these  two  trials,  the  animals  received 
no  audio  or  electric  cues  from  the  sensor  boxes. 

When  they  are  introduced  to  a  new  paddock,  cows  commonly  trace  out  the  perime¬ 
ter  to  familiarized  themselves  with  the  extent  of  their  new  environment  [1].  They  then 
concentrate  their  activities  on  certain  areas  depending  upon  vegetation  and  other  fac¬ 
tors.  During  the  first  trial,  shown  on  the  left  of  Figure  7-5,  the  cows  had  been  recently 
introduced  to  paddock  10B  from  another  neighboring  paddock  (though  they  had  pre- 
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vious  experience  in  paddock  10B),  and  their  perimeter  tracing  behavior  is  evident  in 
the  plot.  In  the  second  trial  (on  the  right  of  Figure  7-5),  the  cows  had  already  been 
in  the  paddock  for  some  time  before  data  were  collected. 

7.5  Modeling  a  Group  of  Cows 

The  method  presented  in  Section  7.3  was  used  to  model  the  dynamics  of  a  group  of 
three  cows,  as  well  as  a  group  of  ten  cows.  Data  collected  as  described  in  Section 
7.4  was  used  for  fitting  the  model  parameters  and  for  evaluating  the  resulting  model. 
We  will  first  present  modeling  results  for  the  three  cows  as  it  is  less  complicated  to 
interpret  data  for  a  smaller  group,  then  we  will  show  results  for  the  ten  cows.  The 
total  number  of  agent-to-agent  interaction  forces  grows  like  the  square  of  the  number 
of  agents,  hence  the  difficulty  in  efficiently  displaying  results  for  large  groups.  Finally 
we  discuss  the  problem  of  validating  the  learned  models,  and  propose  a  statistically 
justified  method  for  validation.  Results  of  the  validation  method  are  shown  for  both 
the  three  and  ten  cow  models. 

7.5.1  Three  Cows 

The  dynamics  of  a  cow  group  are  known  to  be  modal  [90],  in  the  sense  that  model 
parameters  are  approximately  constant  over  contiguous  intervals,  but  can  change 
rapidly  when  switching  between  such  intervals,  for  example  when  the  group  transitions 
from  resting  to  foraging.  We  intentionally  selected  a  52  minute  interval  of  data 
(from  approximately  18:02hrs  to  18:54hrs  on  February  2,  2007)  for  learning  model 
parameters  that  corresponded  to  a  stretch  of  time  when  the  herd  was  apparently  in 
a  constant  foraging  mode.  For  each  cow,  the  data  used  for  the  Least  Squares  fitting 
consisted  of  3100  GPS  position  entries  collected  at  1Hz.  The  data  for  all  animals  were 
artificially  synchronized  to  a  common  clock  using  a  standard  linear  interpolation.  The 
characteristic  time  scale  of  cow  dynamics  is  considerably  longer  than  1  second  (that 
is  to  say,  cows  move  little  in  the  span  of  1  second),  thus  such  an  interpolation  is 
expected  to  have  a  negligible  effect  on  modeling  results. 
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Figure  7-6:  The  agent-to-agent  interaction  forces  are  shown  for  the  three  cows.  Each 
curve  represents  the  size  of  the  force  imposed  by  one  cow  on  another  as  a  function  of 
the  distance  between  the  cows.  A  positive  value  is  attractive  while  a  negative  value 
is  repulsive. 

The  data  were  used  to  find  model  parameters  as  described  in  Section  7.3.  The 
panels  in  Figure  7-6  show  the  agent-to-agent  force  magnitudes  \\fij(pi,Pj)\\  for  the 
three  cows.  For  each  cow,  the  two  curves  show  the  force  imposed  by  each  of  the  two 
other  cows  in  the  group.  Note  that  the  forces  are  not  necessarily  pair-wise  symmetric, 
that  is,  \\fij\\  7^  \\fji\\  in  general.  The  force  curves  are  useful  for  analyzing  behavioral 
traits  of  the  cows.  It  is  well  known  that  groups  of  cows  have  complicated  social  sub¬ 
groupings  and  hierarchies  [58].  The  plots  indicate  that  Cows  1  and  3  had  an  affinity 
for  one  another,  while  Cow  2  was  comparatively  not  very  attractive  to,  or  attracted 
by,  Cows  1  and  3.  We  will  reexamine  the  behavior  of  Cow  2  below  in  the  context  of 
the  ten  cow  group. 

The  environment-to-agent  vector  fields  are  shown  in  the  panels  of  Figure  7-7  for 
the  three  cows.  The  heavy  dots  show  the  centers  of  the  Gaussian  basis  functions,  7^, 
the  arrows  show  the  direction  and  magnitude  of  the  force  felt  by  a  cow  at  each  point, 
and  the  curve  indicates  the  position  data  used  for  learning.  The  Gaussian  centers  were 
spaced  over  an  even  grid  containing  the  trajectory  of  the  cow.  If  the  trajectory  did  not 
come  within  one  standard  deviation,  au,  of  a  Gaussian  function,  the  Gaussian  was 
dropped  from  the  network.  This  primitive  pruning  algorithm  was  used  for  simplicity; 
more  complex  algorithms  could  be  employed.  The  Gaussian  widths  were  chosen  to  be 
2/3  the  length  of  the  grid  space  occupied  by  the  Gaussian.  This  width  was  found  to 
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Figure  7-7:  The  environment-to-agent  force  fields  are  shown  for  the  three  cows.  The 
heavy  dots  indicate  the  centers  of  the  Gaussian  functions  and  the  arrows  show  the 
forces  produced  by  the  learned  vector  field.  The  continuous  curve  marks  the  actual 
cow’s  path  over  the  region. 


give  good  performance  with  our  data.  One  could  imagine  including  the  widths  as  free 
parameters  in  the  Least  Squares  cost  function  (7.14),  but  the  cost  function  becomes 
non-convex  in  this  case  and  is  therefore  very  difficult  to  optimize. 


7.5.2  Ten  Cows 

For  each  cow,  data  consisted  of  4000  GPS  position  entries  collected  at  1Hz  during  the 
second  trial  described  in  Section  7.4.2.  As  before,  care  was  taken  to  use  a  contiguous 
stretch  of  data  (from  approximately  ll:33hrs  to  12:40hrs  on  July  9,  2007)  during  which 
the  cow  group  appeared  to  be  in  a  foraging  mode.  Cows  1,2,  and  3  were  the  same 
animals  as  in  the  first  trial.  The  data  for  all  animals  were  artificially  synchronized  to 
a  common  clock  using  a  standard  linear  interpolation  as  was  done  for  the  three  cow 
data. 

The  data  were  used  to  find  model  parameters  as  described  in  Section  7.3.  The 
panels  in  Figure  7-8  show  the  magnitude  of  the  agent-to-agent  force  for  the  ten  cows. 
The  number  of  agent-to-agent  interaction  forces  is  much  higher  than  for  three  cows 
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(10  x  9  as  opposed  to  3  x  2),  so  the  plots  are  correspondingly  more  complicated.  In 
particular,  the  force  plot  for  each  animal  shows  ten  curves.  Each  of  the  nine  thin 
curves  represents  the  magnitude  of  force  caused  by  each  of  the  nine  other  animals  as 
a  function  of  separation  distance.  The  thick  curve  shows  the  mean  over  all  nine  force 
curves.  Despite  considerable  variation  over  animals  (including  some  inverted  force 
curves)  the  mean  force  felt  by  any  one  animal  as  a  result  of  its  proximity  to  all  of  the 
others  is  relatively  similar,  as  indicated  by  the  mean  force  curve. 

The  environment-to-agent  vector  fields  are  shown  in  the  panels  of  Figure  7-9  for 
the  ten  cows.  The  heavy  dots  show  the  centers  of  the  Gaussian  basis  functions,  7^, 
the  arrows  show  the  direction  and  magnitude  of  the  force  felt  by  a  cow  at  each  point, 
and  the  curve  shows  the  position  data  used  for  regression.  The  Gaussian  centers  were 
spaced  and  pruned  as  described  for  the  three  cow  trial. 

To  demonstrate  the  potential  usefulness  of  the  learned  model  to  study  animal 
behavior,  consider  again  the  behavior  of  Cow  2  in  the  context  of  the  ten  cow  group. 
By  comparing  the  mean  force  curves  in  Figure  7-8  with  the  curves  in  Figure  7-6,  we 
see  that  Cow  2  does  not  tend  to  stay  as  far  from  the  other  cows  in  the  larger  group 
as  in  the  smaller  group.  It  seems,  for  example,  that  Cow  3  stays  farther  from  the 
other  cows  than  does  Cow  2  in  the  larger  group.  The  apparent  dependence  of  animal 
behavior  on  group  size  is  a  property  of  interest  to  the  animal  behavioral  sciences.  Of 
course,  there  are  a  number  of  other  factors  that  could  be  responsible  for  this  behavior, 
including  time  of  year,  the  animals’  physiological  state,  weather  conditions,  and  the 
quality  and  quantity  of  standing  crop.  However,  by  analyzing  the  learned  model  we 
have  generated  a  interesting  hypothesis  about  cow  behavior,  which  can  be  used  to 
guide  the  design  of  further  experiments. 

7.5.3  Model  Validation 

In  terms  of  signal  processing,  our  learning  algorithm  can  be  seen  as  taking  a  time- 
correlated  velocity  signal  and  producing  model  parameters  and  a  residual  error  signal. 
If  our  velocity  data  are  rich  in  temporal  correlation  it  is  good  for  modeling.  Also,  if 
our  learned  model  is  successful  in  capturing  the  relevant  correlation  of  the  velocity 
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Figure  7-8:  The  agent-to- agent  interaction  forces  are  shown  for  the  group  of  10  cows. 
Each  thin,  dashed  curve  represents  the  size  of  the  force  imposed  by  one  cow  on  another 
as  a  function  of  the  distance  between  the  cows.  The  thick,  solid  curve  shows  a  mean 
over  all  of  the  individual  force  curves.  A  positive  value  is  attractive  while  a  negative 
value  is  repulsive. 
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Figure  7-9:  The  environment-to- agent  force  fields  are  shown  for  the  group  of  10  cows. 
Heavy  dots  indicate  the  centers  of  the  Gaussian  functions  and  the  arrows  show  the 
force  produced  by  the  learned  vector  field.  The  continuous  curve  marks  the  cow’s 
actual  path  over  the  region. 
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signal,  the  residual  error  signal  will  have  little  temporal  correlation.  More  plainly,  we 
want  our  velocity  signal  not  to  be  white,  and  our  residual  error  signal  to  be  white. 
Therefore,  we  are  interested  in  testing  for  “whiteness”  in  each  of  these  signals  by 
comparing  them  against  a  90%  whiteness  confidence  interval. 

To  be  specific,  consider  some  random  signal  x(t)  generated  by  a  stationary  Gaus¬ 
sian  white  noise  process  X(t).  Each  point  on  the  empirical  auto-covariance  function, 

Kx{t)  =  ^- -^x(t-T)x(t),  (7.21) 

t—T 

is  asymptotically  normally  distributed,  with  zero  mean  and  variance  equal  to 
(see  [59]  Lemma  9.A1,  or  [72]).  The  90%  confidence  interval  is  then  found  from  the 
inverse  cumulative  normal  distribution  to  have  boundaries  defined  by  the  curves 

C5(r)  =  \[y^Kx{ 0)err1(2  x  .05  -  1)  and  C95{t)  =  ^^Kx(0)err1(2  x  .95  -  1), 

meaning  the  process  X{t)  would  produce  a  value  Kx(t )  below  C5(r)  with  probability 
.05  and  below  C'95(r)  with  probability  .95  for  each  point  r. 

Applying  this  reasoning  to  our  velocity  y*(i)  =  [v;(f)  rq(f)]  and  residual  er¬ 
ror  Wi(t)  signals,  we  validate  the  learned  model  by  examining  the  empirical  auto¬ 
covariance  functions, 

,  T  ^  x  T 

KVl  (t  )  =  ~  T)ViW)T  and  Km(T)  =  jT— -  t) Wi(t f  , 

respectively,  where  the  time  of  the  sample  is  now  explicitly  written  as  an  argument,  for 
example  w^t)  =  w\.  If  the  velocity  y^t)  and  the  residual  error  w^t)  were  generated 
by  a  white  noise  process,  we  would  expect  KVi{r )  and  A%,(r )  to  fall  within  their 
respective  whiteness  confidence  intervals  with  probability  .9  at  each  r.  Again,  we 
want  the  velocity  signal  to  fail  this  test  and  the  residual  error  signal  to  pass  it. 

There  are  other  tests  for  whiteness,  but  this  is  the  simplest  one  with  a  rigorous 
statistical  interpretation  [59].  This  whiteness  test  takes  the  place  of  leave-one-out 
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Velocity 


Figure  7-10:  The  empirical  auto-covariance  function  for  the  Eastern  component  of  the 
velocity  is  shown  in  the  top  plot,  and  for  the  error  residual  in  the  bottom  plot.  The 
dotted  lines  indicate  a  90%  whiteness  confidence  interval,  meaning  that  a  stationary, 
Gaussian  white,  noise  process  would  have  generated  an  empirical  auto-covariance 
inside  the  interval  with  probability  .9  at  each  point.  By  this  metric,  the  velocity 
signal  is  not  white  and  the  residual  error  signal  is  “nearly  white,”  indicating  a  good 
model  has  been  learned  for  the  data. 


validation,  or  other  similar  validation  methods  common  in  machine  learning  applica¬ 
tions.  We  cannot  use  such  methods  because  our  data  is  not  IID,  a  key  assumption 
in  most  machine  learning  algorithms.  Indeed,  our  model  is  specifically  trying  to  cap¬ 
ture  correlation  between  data  points,  so  to  leave  one  data  point  out  would  obscure 
precisely  the  relationship  we  want  to  learn. 

The  top  of  Figure  7-10  shows  the  auto-covariance  from  the  three  cow  trial  of 
the  Eastern  component  of  the  velocity  for  Cow  1,  and  the  bottom  figure  shows  the 
auto-covariance  of  the  corresponding  residual  error.  Notice  there  is  strong  temporal 
correlation  in  the  velocity,  and  all  points  in  the  plot  lie  outside  the  confidence  interval, 
therefore  it  fails  the  whiteness  test,  as  desired.  For  the  residual  error  auto-covariance, 
there  is  apparently  little  temporal  correlation  and  a  large  majority  of  the  points 
lie  inside  the  whiteness  confidence  interval,  therefore  it  passes  the  whiteness  test. 
Thus,  by  this  measure,  the  algorithm  has  done  a  good  job  of  producing  a  model 
to  describe  the  cow’s  dynamics.  The  plots  for  the  other  components  of  the  auto- 
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Velocity 

Residual 

Cow  Number 

1 

2 

3 

1 

2 

3 

East-East 

0 

0 

0 

76 

81 

87 

East-North 

0 

0 

5 

73 

78 

83 

North-East 

21 

4 

17 

81 

84 

91 

North-North 

0 

0 

0 

66 

68 

87 

Table  7.1:  The  table  shows  what  percentage  of  points  lie  within  the  90%  whiteness 
confidence  interval  for  each  of  the  3  cows  in  the  first  trial,  and  for  each  of  the  four 
components  of  the  auto-covariance  function.  According  to  this  test,  the  velocity  signal 
is  not  white,  and  the  residual  error  is  approximately  white,  so  the  model  fits  the  data 
well. 

covariance  functions  and  for  the  other  cows  in  the  three  cow  trial  are  excluded  in 
the  interests  of  space.  Instead,  we  summarize  the  results  in  Table  7.1,  which  shows 
for  each  cow,  and  for  each  of  the  four  components  of  the  auto-covariance  functions 
A')6.(t)  and  Kyi(r),  the  percentage  of  points  within  the  90%  whiteness  interval.  The 
results  show  that  the  velocity  signals  for  all  cows  fail  the  whiteness  test  (as  desired), 
while  the  residual  error  signals  can  all  be  considered  nearly  white  in  that  nearly  90% 
of  their  values  were  within  the  confidence  interval. 

The  whiteness  test  was  also  carried  out  for  ten  cows  with  similar  results  as  sum¬ 
marize  in  Table  7.2.  The  results  in  the  table  show  that  all  of  the  residual  errors  for 
the  ten  cow  model  are  nearly  white.  As  for  Kyi  in  this  case,  all  of  the  points  for  all  of 
the  components  and  all  of  the  cows  lie  outside  of  the  whiteness  confidence  interval, 
therefore  the  velocity  is  very  likely  not  white  for  any  cow. 


7.6  Synthetic  Control 

Simulation  experiments  were  carried  out  with  the  model  fitted  in  Section  7.5.1.  We 
simulated  a  group  of  three  simple  mobile  robots  controlled  to  have  the  dynamics 
in  (7.1)  with  the  parameters  found  in  Section  7.5.1.  These  equations  were  iterated 
forward  in  time  in  a  Matlab  environment  with  the  robots  started  from  the  same  initial 
positions  as  the  cows.  The  simulation  procedure  is  summarized  in  Algorithm  5. 

The  trajectories  of  the  robots  from  a  typical  simulation  are  shown  in  the  left  side 
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Residual 

Cow  Number 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

East-East 

75 

85 

87 

81 

69 

85 

87 

79 

71 

84 

East-North 

69 

82 

76 

83 

75 

78 

77 

74 

83 

North-East 

79 

80 

82 

84 

72 

80 

75 

80 

North-North 

68 

75 

75 

84 

62 

61 

74 

73 

69 

83 

Table  7.2:  The  table  shows  what  percentage  of  points  lie  within  the  90%  whiteness 
confidence  interval  for  each  of  the  10  cows,  and  for  each  of  the  four  components  of 
the  residual  error  auto-covariance  function.  By  this  metric,  the  residual  errors  for  all 
cows  are  approximately  white.  For  the  velocity  auto-covariance  function  (not  shown 
in  the  table),  no  point  is  within  the  interval  for  any  cow,  thus  the  velocity  is  very 
likely  not  white.  By  this  test,  the  ten-cow  model  successfully  fits  the  data. 


Algorithm  5  Synthetic  Control  Algorithm 

Execute  Algorithm  3  to  obtain  a  set  of  optimal  parameters  for  each  agent 
Set  initial  conditions  for  simulated  group  of  robots 

loop 

for  Each  robot  in  the  group  do 

Use  the  current  state  of  all  the  robots  and  9j  obtained  from  Algorithm  3. 
Apply  these  to  the  dynamical  equations  for  agent  i  (7.1)  to  produce  the  next 
robot  state 

end  for 
end  loop 
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of  Figure  7-11  laid  over  a  schematic  showing  the  fences  of  the  paddock  where  the 
actual  cow  data  were  recorded.  The  trajectories  of  the  simulation  are  similar  to  those 
of  the  real  cows.  Most  importantly,  the  simulated  robots  track  the  fence  lines,  as  did 
the  real  cows.  This  tendency  is  captured  solely  through  the  agent-to-environment 
force  field  (described  in  Section  7.2.3),  as  the  model  has  no  direct  knowledge  of  where 
fence  lines  may  lie.  Furthermore,  statistics  were  gathered  for  the  simulated  robots 
and  compared  with  those  from  the  cow  data.  Figure  7-12  show's  a  comparison  of 
the  two  sets  of  statistics.  Specifically,  the  distance  between  cows  over  time  and  the 
speed  of  the  cowrs  over  time  have  similar  mean  and  standard  deviation  for  the  real  and 
simulated  data.  Thus  the  model  preserves  global  properties  of  the  group,  as  measured 
by  these  statistics. 

One  should  expect  the  trajectories  of  the  simulation  to  be  qualitatively  similar 
to  the  actual  training  data,  but  the  question  of  how  similar  is  not  a  simple  one. 
The  model  we  have  constructed  is  a  random  process,  and  two  different  sets  of  data 
generated  by  the  same  random  process  wall  almost  certainly  be  different.  It  is  also 
not  informative  to  look  at,  for  example,  the  mean  distance  between  points  of  the 
actual  and  simulated  data,  since,  again,  two  signals  from  the  same  random  process 
can  generate  trajectories  arbitrarily  far  from  one  another.  The  appropriate  test  for 
model  validation  is  the  whiteness  test  described  in  Section  7.5.3.  We  show  Figures 
7-11  and  7-12  only  to  indicate  that  the  properties  verified  with  the  whiteness  test 
lead,  in  practice,  to  a  qualitative  match  in  performance. 

It  is  also  important  to  point  out  that  comparing  these  simulation  results  to  the  cow- 
data  is  not  the  same  as  testing  a  learned  model  on  training  data,  a  common  pitfall  in 
machine  learning  applications.  Indeed,  the  only  training  data  given  to  the  simulation 
are  the  initial  positions  of  the  robots.  The  model  recursively  generates  its  own  data 
points  which  then  become  inputs  for  successive  time  steps.  This  is  a  manifestation 
of  the  fact  that  system  identification  takes  place  in  a  non-IID  setting,  so  much  of  the 
intuition  that  applies  in  typical  machine  learning  problems  is  not  applicable. 

This  simulation  study  suggests  that  our  model  equations  can  be  used  to  control 
a  group  of  robots  to  exhibit  the  behavior  of  the  modeled  group.  In  this  way  con- 
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Simulated  Robot  Trajectories  Actual  Cow  Trajectories 


Figure  7-11:  The  left  plot  shows  trajectories  of  a  team  of  simulated  robots  controlled 
to  behave  like  a  group  of  cows.  The  robots  use  dynamical  laws  generated  from  the 
procedure  described  in  this  chapter.  Their  trajectories  are  superimposed  over  the 
fence  lines  of  the  paddock  where  the  original  cow  data  were  collected,  though  they 
have  no  direct  knowledge  of  fence  positions.  The  right  picture  shows  the  actual  cow 
data  over  the  same  time  window. 

trailers  can  be  automatically  synthesized  for  robots  to  mimic  groups  that  have  some 
desirable  collective  behavior,  such  as  flocking  or  herding.  One  can  also  imagine  in¬ 
troducing  artificial  members  of  a  group  without  changing  the  group  dynamics  (i.e. 
without  “being  noticed”)  or  for  the  purpose  of  modifying  the  group  dynamics  in  a 
non-disruptive  way,  for  example  to  influence  collective  decision  making  in  natural 
groups,  as  was  done  in  [39]. 


7.7  Synopsis 

In  this  chapter,  we  presented  a  method  to  generate  behavior  models  of  groups  of 
dynamical  agents,  such  as  cow  herds,  using  observations  of  the  agents’  positions  over 
time.  We  formulated  a  physically  motivated  difference  equation  model,  and  used  Least 
Squares  system  identification  to  fit  the  model  to  data.  We  demonstrated  the  method 
by  learning  models  for  a  group  of  three  cows  and  a  group  of  ten  cows  using  GPS 
position  data.  The  position  data  were  collected  with  specially  designed  sensor  boxes 
fitted  to  the  heads  of  free-ranging  cows.  An  important  and  surprising  contribution 
of  this  chapter  is  the  demonstration  that  a  minimalist  approach  to  modeling  group 
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Cow  Cow 


Figure  7-12:  The  bar  charts  compare  statistics  for  the  actual  cow  data  and  the  sim¬ 
ulated  robots.  The  top  charts  show  the  mean  and  standard  deviation  of  the  distance 
from  one  cow  to  the  other  two  cows  in  the  group.  The  bottom  charts  show  the  mean 
and  standard  deviation  of  the  speed  of  each  cow. 


interactions  using  only  position  data  leads  to  meaningful  group  dynamical  models. 

Our  approach  is  minimalist  in  that  no  information  is  included  in  the  model  about 
the  geometry  and  configuration  of  the  environment,  nor  about  any  attractive  (e.g. 
vegetation)  or  repulsive  (e.g.  fences)  features  in  the  environment.  It  was  shown  in 
Section  7.6,  however,  that  our  method  can  be  used  to  infer  the  locations  of  such 
features,  since  the  robots  avoided  a  fence  obstacle  even  though  they  were  given  no 
prior  indication  of  the  fence’s  existence.  An  interesting  research  direction  is  to  inves¬ 
tigate  the  trade-offs  between  including  additional  information  about  features  in  the 
environment  and  the  quality  of  the  resulting  model.  More  specifically,  we  can  explic¬ 
itly  model  obstacles  in  the  space  as  a  force  field  with  some  free  parameters  that  are 
learned  from  the  position  data.  We  can  also  include  dependencies  upon  weather  and 
other  ambient  environmental  conditions  for  which  measurements  are  available.  The 
question  is,  does  the  performance  improvement  of  the  learned  model  justify  the  extra 
complexity  and  prior  information  required  for  such  a  model?  Our  preliminary  studies 
with  explicit  fence  models  show  that  this  additional  information  leads  to  models  that 
give  similar  behavior  to  those  without  the  explicit  obstacle  features,  but  the  explicit 
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inclusion  of  the  obstacle  gives  the  ability  to  enforce  hard  position  constraints  on  the 
agents.  We  generally  prefer  the  minimalist  approach  described  in  this  chapter  in  that 
it  is  amenable  to  situations  where  no  detailed  environmental  information  is  available. 

Our  work  has  provided  some  insights  into  developing  a  minimalist  approach  to 
modeling  group  behavior,  however  many  questions  remain  to  be  resolved.  Learning 
models  of  complex  natural  and  artificial  groups  is  an  exercise  in  balancing  tradeoffs 
between  model  fidelity  and  model  complexity.  The  systems  we  are  interested  in 
modeling  are  too  sophisticated  to  characterize  their  motion  in  its  entirety,  but  we 
have  shown  in  this  chapter  that  a  simple  model  structure  with  a  simple  learning 
algorithm  can  give  enough  prediction  power  to  be  practically  useful  for  controlling, 
simulating,  and  interacting  with  groups  of  dynamical  agents. 
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Chapter  8 


Conclusions,  Lessons  Learned,  and 
Future  Work 


This  thesis  considers  a  method,  based  on  gradient  optimization,  for  controlling  groups 
of  agents  to  reach  a  goal  configuration.  We  focus  on  the  problem  of  deploying  robots 
over  an  environment  to  do  sensing,  a  task  called  coverage.  We  show  that  coverage  is 
actually  of  a  general  enough  nature  to  represent  a  number  of  problems  not  normally 
associated  with  it,  for  example  consensus  and  herding.  We  augment  the  multi-agent 
gradient  controller  with  learning  to  allow  for  robots  to  adapt  to  unknown  environmen¬ 
tal  conditions.  The  learning  is  incorporated  with  provable  performance  and  stability 
guarantees  using  a  Lyapunov  proof  technique.  We  implemented  the  multi-agent  learn¬ 
ing  controller  on  a  group  of  16  mobile  robots  and  performed  experiments  in  which 
they  had  to  learn  the  intensity  of  light  in  the  environment.  We  also  implemented 
a  multi-agent  coverage  controller  on  a  group  of  flying  quad-rotor  robots  with  down¬ 
ward  facing  cameras.  The  controller  used  a  realistic  model  of  the  camera  as  a  sensor. 
Experiments  were  performed  with  3  quad-rotor  robots  and  were  shown  to  provide 
multi-robot  sensor  coverage  as  predicted.  Finally,  we  used  the  multi-robot  dynamics 
to  model  the  motion  of  cows  in  a  herd.  We  used  system  identification  techniques  to 
tune  the  model  parameters  using  GPS  positions  from  a  herd  of  actual  cows. 

In  the  course  of  this  research  we  learned  several  lessons  concerning  the  design 
and  implementation  of  multi-robot  controllers.  Firstly,  we  learned  that  the  coverage 
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optimization  can  represent  many  different  multi-robot  behaviors.  This  is  one  of  the 
main  themes  of  this  thesis:  the  unification  of  multi-robot  control  under  the  umbrella 
of  a  single  optimization  problem.  That  optimization  problem  can  be  specialized  to 
specific  multi-robot  tasks  and  specific  robot  capabilities,  and  a  stable  controller  can 
be  derived  from  the  gradient  of  the  cost  function.  We  show  that  this  simple  formula 
for  designing  multi-robot  controllers  produces  robust,  practical  controllers  that  are 
feasible  on  a  range  of  robot  platforms. 

A  second  lesson  that  we  learned  is  that  in  a  multi-robot  setting,  consensus  algo¬ 
rithms  can  sometimes  substitute  for  centralized  knowledge.  For  example,  consensus 
was  used  in  our  learning  algorithm  in  Chapter  4  to  propagate  sensor  information 
around  the  network.  Each  robot  was  able  to  asymptotically  learn  the  sensory  func¬ 
tion  as  well  as  if  they  had  direct  access  to  all  the  robots’  sensor  measurements.  In 
deed,  it  appears  that  consensus  algorithms  could  be  a  fundamental  and  practical 
tool  for  enabling  distributed  learning  in  general,  and  have  compelling  parallels  with 
distributed  learning  mechanisms  in  biological  systems. 

Finally,  we  learned  that  analysis  is  only  useful  up  to  a  point,  after  which  the  true 
proof  of  performance  is  in  the  implementation  of  the  controller  on  real  robot  platforms. 
This  is  a  fundamental  engineering  point  of  view.  No  tractable  mathematical  model 
will  be  able  to  capture  the  intricacies  of  the  dynamics,  noise,  and  computational 
processes,  so  the  final  proof  must  be  an  implementation.  We  have  found  that  the 
best  policy  is  to  model  the  systems  as  simply  as  possible,  derive  as  many  properties 
as  possible  from  the  simple  model,  test  in  simulation  to  make  sure  the  model  makes 
sense,  then  implement  on  actual  robot  platforms  to  verify  the  analysis.  We  have 
endeavored  to  follow  this  policy  throughout  the  research  in  this  thesis. 

The  research  in  this  thesis  also  points  toward  several  lines  of  future  work.  The 
most  important  immediate  work  to  be  done  is  in  finding  a  suitable  way  to  implement 
a  controller  over  a  given  communication  graph.  Let  the  graph  induced  by  a  controller 
as  described  in  Section  2.4.1  be  denoted  Qi .  Let  the  communication  graph  be  denoted 
Qc-  If  the  graph  induced  by  the  controller  is  a  subgraph  of  the  communication  graph, 
Qi  C  Qc,  then  the  controller  is  feasible.  Of  course  it  may  very  well  be  the  case 
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that  the  controller  is  not  feasible  given  the  communication  graph,  in  which  case  an 
approximation  must  be  used.  Two  methods  are  suggested  in  this  thesis:  1)  each 
robot  computes  its  controller  using  only  the  information  from  neighbors  available 
to  it,  and  2)  each  robot  maintains  an  estimator  of  the  positions  of  all  the  robots 
required  to  compute  its  controller,  and  makes  its  computations  using  these  estimates. 
In  the  future  the  stability  and  robustness  properties  of  these  methods  should  be 
characterized  and  other  methods  should  be  investigated. 

Also  our  recognition  that  coverage  problems  stem  from  nonconvex  optimizations 
suggests  some  new  research  directions.  Gradient  descent  controllers,  which  are  the 
most  common  type  in  the  multi-robot  control  literature,  in  general  can  only  be  ex¬ 
pected  to  find  local  minima  of  nonconvex  cost  functions.  Therefore  it  is  worth  while  to 
look  for  special  cases  of  multi-robot  cost  functions  that  might  allow  for  global  minima 
to  be  reached  with  gradient  controllers.  For  example,  if  the  cost  function  has  a  single 
minimum  despite  being  nonconvex,  it  may  be  possible  to  prove  convergence  to  that 
minimum  with  a  gradient  controller.  As  another  example,  if  all  of  the  minima  are 
global  (they  all  take  on  the  same  minimal  value  of  H)  then  gradient  controllers  will 
find  global  minima.  Alternately,  we  are  motivated  to  consider  other  nonconvex  opti¬ 
mization  methods  besides  gradient  descent  that  can  be  implemented  in  a  multi-robot 
setting. 

Another  direction  for  future  work  is  to  move  beyond  the  class  of  cost  functions 
considered  here.  It  would  be  interesting  to  consider  cost  function  that  depend,  for  ex¬ 
ample,  upon  the  time  history  of  the  robots,  as  in  optimal  control,  or  that  incorporate 
more  complicated  robot  dynamics  than  the  simple  integrator  dynamics.  Cost  func¬ 
tions  such  as  the  one  in  this  thesis  that  only  depend  upon  the  positions  of  the  robots 
lead  to  behaviors  in  which  the  robots  move  to  a  configuration  and  remain  fixed.  An 
expanded  class  of  cost  functions  would  lead  to  more  complex,  dynamic  muti-robot 
behaviors. 

In  the  case  of  system  identification  for  groups  of  dynamical  agents,  an  important 
problem  to  address  in  the  future  is  capturing  the  modal  changes  in  the  dynamics  of 
groups  of  agents  over  long  time  scales.  We  collected  data  at  1Hz  continuously  over 
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several  days  for  the  cows  in  Chapter  7,  but  as  discussed  previously,  we  only  expected 
our  model  to  describe  the  cow  group  dynamics  over  an  interval  of  approximately  an 
hour,  during  which  time  the  group  is  in  a  single  behavioral  mode.  In  the  future, 
we  would  like  to  broaden  the  model  class  to  include  switching  state-space  models. 
That  is,  we  would  model  both  the  motion  of  the  group  while  it  is  in  one  mode 
and  the  transitions  among  modes.  With  such  a  model  structure  we  expect  to  be 
able  to  capture  the  behavior  of  the  cow  group  over  extended  periods  of  time  and  to 
be  able  to  model  other  natural  and  artificial  groups  that  exhibit  modal  properties 
(e.g.  traffic  motion,  which  is  congested  during  rush  hour  and  less  so  at  other  times). 
Unfortunately,  exact  system  identification  is  known  to  be  intractable  for  switching 
state  space  models  [19].  A  topic  of  current  research  in  the  system  identification 
and  learning  communities  is  to  find  approximately  optimal  parameters  using,  e.g. 
variational  approaches  [36],  or  Markov-Chain  Monte  Carlo  (MCMC)  methods  [69]. 

We  expect  that  these  open  questions  will  motivate  new  results  and  new  insights 
for  multi-robot  control.  We  hope  that  the  gradient  optimization  approach  in  this 
thesis  will  yield  new  insights  and  enable  new  multi-robot  control  algorithms.  We 
also  hope  that  our  emphasis  on  incorporating  learning  in  multi-robot  systems  will 
be  a  step  toward  multi-robot  technologies  that  interact  flexibly  with  an  uncertain 
world,  gathering  information  from  their  environment  to  proceed  toward  a  common 
goal.  The  way  forward  for  multi-robot  control  is  to  unify  the  diverse  specialized 
results  which  abound  in  the  field,  and  to  work  toward  a  simple  policy  to  create  multi¬ 
robot  controllers  that  will  help  us  to  monitor  and  manipulate  our  environment  for 
the  better. 
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Appendix  A 


Proofs  of  Lemmas 


Lemma  A.l  (Uniform  Continuity  for  Basic  Controller)  For  the  basic  controller, 
V  is  uniformly  continuous. 


Proof  A.l  We  will  bound  the  time  derivatives  of  a  number  of  quantities.  A  bounded 
derivative  is  sufficient  for  uniform  continuity.  Firstly,  notice  that  Cv„Pi  €Vt  C  Q, 
so  Cvi  andpi  are  bounded,  which  implies pt  =  K(Cy,  ~Pi )  is  bounded.  Consider  terms 
of  the  form 

Tt{imt>dq)  (A'1) 

where  f{q,t )  is  a  bounded  function  with  a  bounded  time  derivative  ftf{q,t).  We  have 


d 

dt 


fjM  V)  =  l  ^  M  +  fv  /(,,  t)nlVt  ±  dq. 


3  =  1 


dpi 


(A.2) 


where  dV]  is  the  boundary  of  Vi  andn^y  is  the  outward  facing  normal  of  the  boundary. 
Now  is  bounded  for  all  j,  pj  was  already  shown  to  be  bounded,  and  f(q,t)  is 

bounded  by  assumption,  therefore  d/dt(fy  f(q,t)dq )  is  bounded. 

Notice  that  Cyi  is  composed  of  terms  of  this  form,  so  it  is  bounded.  Therefore 
pi  —  K(Cvi  —  Pi)  is  bounded,  andpi  is  uniformly  continuous. 
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Now  consider 


V  = 


(q  -  Pi)T<i>(q )  dqpi  +  of  r  1ai 


(A.3) 


The  first  term  inside  the  sum  is  uniformly  continuous  since  it  is  the  product  of  two 
quantities  which  were  already  shown  to  have  bounded  time  derivatives,  namely  fv  (q  — 
Pi)T4>(q)  dq  (an  integral  of  the  form  (A. 2))  and  pi.  Now  consider  the  second  term  in 
the  sum.  It  is  continuous  in  time  since  is  continuous.  Expanding  it  using  (4-H) 
and  (4-13)  as 


%  T  i.dproji  I)(Fiai  +  7(AjOj  A,))  (A. 4) 

shows  that  it  is  not  differentiable  where  the  matrix  Ipr0j.  switches.  However,  the 
switching  condition  (4-15)  is  such  that  &i(t)  is  not  differentiable  only  at  isolated  points 
on  the  domain  [0,  oo) .  Also,  at  all  points  where  it  is  differentiable,  its  time  derivative 
is  uniformly  bounded  (since  and  the  integrands  of  A*  and  A,:  are  bounded,  and  Fi  is 
composed  of  the  kind  of  integral  terms  of  the  form  (A. 2)).  This  implies  that  afV ~1d 
is  uniformly  continuous.  We  conclude  that  V  is  uniformly  continuous. 


Lemma  A. 2  (Uniform  Continuity  for  Consensus  Controller)  For  the  consen¬ 
sus  controller,  V  is  uniformly  continuous. 


Proof  A. 2  We  have 


V  =  ±-f  (q-Pi)T<t>(q)dqpi  +  aJ r  la, 

i= 1  L  Jvt 


(A.5) 


therefore  the  reasoning  of  the  proof  of  Lemma  A.l  applies  as  long  as  d*  can  be  shown  to 
be  uniformly  continuous.  But  hi  only  differs  from  the  basic  controller  in  the  presence 
of  the  term 

n 

C  /U  hjiffi  —  af).  (A.6) 

j= i 
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The  Voronoi  edge  length,  ltj,  is  a  continuous  function  of  pk,  k  €  {1, . . .  ,n}.  Further¬ 
more,  where  it  is  differentiable,  it  has  uniformly  bounded  derivatives.  It  was  shown 
in  the  proof  of  Lemma  A.l  that  is  bounded,  so  similarly  to  hi,  the  points  at  which 
kj(pi(t), . . .  ,pn(t))  is  not  differentiable  are  isolated  points  on  [0,  oo).  Therefore  kj 
is  uniformly  continuous  in  time.  All  other  terms  in  awei  were  previously  shown  to 
be  uniformly  continuous,  so  apr£i  is  uniformly  continuous.  As  shown  in  the  proof 
of  Lemma  A.l,  the  projection  operation  preserves  uniform  continuity,  therefore  hi  is 
uniformly  continuous. 
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